Skip to content

Fix incorrect dimension used for temporal weights generation #749

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Apr 14, 2025

Conversation

tomvothecoder
Copy link
Collaborator

@tomvothecoder tomvothecoder commented Apr 2, 2025

Description

This PR addresses an issue in the temporal group average API where weights were calculated using positional indexes to subtract time bounds. This approach mistakenly assumes a fixed order and number of dimensions.

The Issue

For example, in issue #748, the time bounds have dimensions arranged as (member, time, bound). The current logic incorrectly uses the "time" dimension for calculating weights instead of the "bound" dimension. This error ultimately leads to the following downstream issue:

ValueError: cannot add coordinates with new dimensions to a DataArray

The Solution

To resolve this, the calculation now uses label-based indexing via the new get_bounds_dim() function. This change ensures that the correct dimension ("bound") is identified and used, regardless of the order or number of dimensions in the time bounds array.

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • My changes generate no new warnings
  • Any dependent changes have been merged and published in downstream modules

If applicable:

  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass with my changes (locally and CI/CD build)
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have noted that this is a breaking change for a major release (fix or feature that would cause existing functionality to not work as expected)

@github-actions github-actions bot added type: bug Inconsistencies or issues which will cause an issue or problem for users or implementors. type: docs Updates to documentation labels Apr 2, 2025
@tomvothecoder tomvothecoder requested a review from Copilot April 2, 2025 20:26
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes an issue in the temporal group average API where weights were calculated using an incorrect dimension. The changes introduce label-based indexing for calculating temporal weights and add a utility function to identify the correct bounds dimension.

  • In xcdat/temporal.py, the subtraction-based weights computation is replaced with a diff along the dynamically determined bounds dimension.
  • In xcdat/bounds.py, a new function get_bounds_dim is added to identify valid bounds dimensions.
  • Updates in init.py and tests ensure the new functionality is correctly exposed and validated.

Reviewed Changes

Copilot reviewed 4 out of 5 changed files in this pull request and generated 1 comment.

File Description
xcdat/temporal.py Replaced fixed indexing with a diff on the correct bounds dimension.
xcdat/bounds.py Added get_bounds_dim utility to determine the correct bounds dimension.
xcdat/init.py Updated import to include get_bounds_dim.
tests/test_bounds.py Added tests for get_bounds_dim functionality.
Files not reviewed (1)
  • docs/api.rst: Language not supported

@tomvothecoder tomvothecoder self-assigned this Apr 2, 2025
@tomvothecoder tomvothecoder moved this from Todo to In Review in xCDAT Development Apr 2, 2025
Copy link

codecov bot commented Apr 2, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 100.00%. Comparing base (94b3a98) to head (58d9791).
Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff            @@
##              main      #749   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           16        16           
  Lines         1681      1687    +6     
=========================================
+ Hits          1681      1687    +6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Collaborator Author

@tomvothecoder tomvothecoder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @pochedls, this PR is ready for review. I re-ran your example and it works now. Unit tests have also been added.

Example

# %%
import xcdat as xc
import xarray as xr

# tutorial data
ds = xc.tutorial.open_dataset("tas_amon_access")
dsc = xr.concat((ds, ds, ds), dim="member")
dsc_avg = dsc.temporal.group_average("tas", freq="year")

# %%
# other data
fn = "/p/user_pub/climate_work/pochedley1/extreme2324/tlt90_Amon_E3SM-2-0_historical-ssp370_ensemble_gr_187001-202412.nc"
ds = xc.open_dataset(fn)
ds_avg = ds.temporal.group_average("tlt90", freq="year")

Output

print(ds_avg.time[0:10])

<xarray.DataArray 'time' (time: 10)> Size: 80B
array([cftime.Datetime360Day(1870, 1, 1, 0, 0, 0, 0, has_year_zero=True),
       cftime.Datetime360Day(1871, 1, 1, 0, 0, 0, 0, has_year_zero=True),
       cftime.Datetime360Day(1872, 1, 1, 0, 0, 0, 0, has_year_zero=True),
       cftime.Datetime360Day(1873, 1, 1, 0, 0, 0, 0, has_year_zero=True),
       cftime.Datetime360Day(1874, 1, 1, 0, 0, 0, 0, has_year_zero=True),
       cftime.Datetime360Day(1875, 1, 1, 0, 0, 0, 0, has_year_zero=True),
       cftime.Datetime360Day(1876, 1, 1, 0, 0, 0, 0, has_year_zero=True),
       cftime.Datetime360Day(1877, 1, 1, 0, 0, 0, 0, has_year_zero=True),
       cftime.Datetime360Day(1878, 1, 1, 0, 0, 0, 0, has_year_zero=True),
       cftime.Datetime360Day(1879, 1, 1, 0, 0, 0, 0, has_year_zero=True)],
      dtype=object)
Coordinates:
  * time     (time) object 80B 1870-01-01 00:00:00 ... 1879-01-01 00:00:00
Attributes:
    bounds:        time_bnds
    axis:          T
    realtopology:  linear

print(ds_avg.tlt90[0:10])

<xarray.DataArray 'tlt90' (member: 10, time: 155)> Size: 12kB
array([[-0.11043303, -0.16092832, -0.08829556, ...,  0.79177042,
         1.16704312,  0.86311752],
       [ 0.11128599, -0.05549273,  0.04180344, ...,  0.93199587,
         1.02952424,  1.03610563],
       [-0.11125527, -0.06464632,  0.08757703, ...,  0.83790361,
         1.01119917,  1.19474977],
       ...,
       [ 0.05384331,  0.04045577, -0.04728429, ...,  0.9439962 ,
         1.02731631,  0.76063299],
       [ 0.25416844,  0.07389999,  0.23717786, ...,  0.83700803,
         1.08495403,  0.85116314],
       [ 0.02509733, -0.0937595 , -0.06834454, ...,  1.04069364,
         1.14190511,  0.99018879]])
Coordinates:
  * member   (member) <U9 360B 'r1i1p1f1' 'r2i1p1f1' ... 'r9i1p1f1' 'r10i1p1f1'
  * time     (time) object 1kB 1870-01-01 00:00:00 ... 2024-01-01 00:00:00
Attributes:
    comment:        Temperature of the lower boundary of the atmosphere
    long_name:      tlt Equivalent MSU Brightness Temperature
    standard_name:  surface_temperature
    cell_methods:   area: time: mean
    cell_measures:  area: areacella
    units:          K
    operation:      temporal_avg
    mode:           group_average
    freq:           year
    weighted:       True

@tomvothecoder
Copy link
Collaborator Author

@pochedls I wonder if assuming dimensional order and length affects the spatial averager/spatial bounds. Is there ever a case where lat/lon bounds have extra dimensions? (e.g., "member" with time bounds).

We assume fixed order of dimensions using positional indexing in spatial.py here:

pm_cells = np.where(domain_bounds[:, 1] - domain_bounds[:, 0] < 0)[0]

return np.abs(domain_bounds[:, 1] - domain_bounds[:, 0])

xcdat/xcdat/spatial.py

Lines 587 to 601 in c198620

if r_bounds[1] >= r_bounds[0]:
# Case 1 (simple case): not wrapping around prime meridian (or
# latitude axis).
# Adjustments for above / right of region.
d_bounds[d_bounds[:, 0] > r_bounds[1], 0] = r_bounds[1]
d_bounds[d_bounds[:, 1] > r_bounds[1], 1] = r_bounds[1]
# Adjustments for below / left of region.
d_bounds[d_bounds[:, 0] < r_bounds[0], 0] = r_bounds[0]
d_bounds[d_bounds[:, 1] < r_bounds[0], 1] = r_bounds[0]
else:
# Case 2: wrapping around prime meridian [for longitude only]
domain_lowers = d_bounds[:, 0]
domain_uppers = d_bounds[:, 1]
region_lower, region_upper = r_bounds

@pochedls
Copy link
Collaborator

pochedls commented Apr 3, 2025

@tomvothecoder – I did not realize that was the issue (that time_bnds has dimensions [member, time]). Running the following deals with this issue: ds['time_bnds'] = ds.time_bnds.isel(member=0).

The spatial averager does assume that bounds are 2D [lat, bnd] or [lon, bnd] and would fail if they were not.

@tomvothecoder
Copy link
Collaborator Author

tomvothecoder commented Apr 7, 2025

@tomvothecoder – I did not realize that was the issue (that time_bnds has dimensions [member, time]). Running the following deals with this issue: ds['time_bnds'] = ds.time_bnds.isel(member=0).

The spatial averager does assume that bounds are 2D [lat, bnd] or [lon, bnd] and would fail if they were not.

Got it, I'll open another ticket for us to handle this issue with the spatial averager API.

UPDATE: #750

Copy link
Collaborator

@pochedls pochedls left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR looks good to me.

- The previous method would take the difference of the time bounds using indexing, which assumed the bounds dimension was in a specific order. It did not take into account more than two dimensions (e.g., member, time, bounds)
@tomvothecoder tomvothecoder merged commit 4f6313e into main Apr 14, 2025
10 checks passed
@github-project-automation github-project-automation bot moved this from In Review to Done in xCDAT Development Apr 14, 2025
@tomvothecoder tomvothecoder deleted the bugfix/748-group-avg branch April 14, 2025 16:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug Inconsistencies or issues which will cause an issue or problem for users or implementors. type: docs Updates to documentation
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

[Bug]: Coordinate ValueError using .group_average
2 participants