Skip to content

Conversation

@tomvothecoder
Copy link
Collaborator

Description

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • My changes generate no new warnings
  • Any dependent changes have been merged and published in downstream modules

If applicable:

  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass with my changes (locally and CI/CD build)
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have noted that this is a breaking change for a major release (fix or feature that would cause existing functionality to not work as expected)

@tomvothecoder tomvothecoder added the type: enhancement New enhancement request label Nov 22, 2024
@tomvothecoder tomvothecoder self-assigned this Nov 22, 2024
@tomvothecoder tomvothecoder force-pushed the feature/565-temporal-bnds branch from 12afd27 to 3bf227c Compare November 22, 2024 22:11
@tomvothecoder
Copy link
Collaborator Author

tomvothecoder commented Nov 22, 2024

@pochedls and @oliviermarti this PR should address this GH issue (same as this comment from @oliviermarti).

If you can check this branch out and try it that'd be great.

import numpy as np
import pandas as pd
import xcdat as xc

# Create a dummy xarray dataset
time = pd.date_range("2000-01-01", "2001-12-31", freq="D")
data = np.random.rand(len(time))
dummy_ds = xr.Dataset({"dummy_var": (["time"], data)}, coords={"time": time})
dummy_ds["time"].encoding["calendar"] = "standard"
dummy_ds = dummy_ds.bounds.add_missing_bounds(axes=["T"])

ds_avg = dummy_ds.temporal.group_average("dummy_var", freq="month")

Before -- no time_bnds and time starts at the beginning of the averaged period

ds_avg.time

<xarray.DataArray 'time' (time: 24)> Size: 192B
array([cftime.DatetimeGregorian(2000, 1, 1, 0, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(2000, 2, 1, 0, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(2000, 3, 1, 0, 0, 0, 0, has_year_zero=False),
		...
      dtype=object)
Coordinates:
  * time     (time) object 192B 2000-01-01 00:00:00 ... 2001-12-01 00:00:00
Attributes:
    bounds:   time_bnds

Result -- time is now centered using time_bnds

ds_avg.time

array([cftime.DatetimeGregorian(2000, 1, 16, 12, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(2000, 2, 15, 12, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(2000, 3, 16, 12, 0, 0, 0, has_year_zero=False),
		...
      dtype=object)
ds_avg.time_bnds

array([[cftime.DatetimeGregorian(2000, 1, 1, 0, 0, 0, 0, has_year_zero=False),
        cftime.DatetimeGregorian(2000, 2, 1, 0, 0, 0, 0, has_year_zero=False)],
       [cftime.DatetimeGregorian(2000, 2, 1, 0, 0, 0, 0, has_year_zero=False),
        cftime.DatetimeGregorian(2000, 3, 1, 0, 0, 0, 0, has_year_zero=False)],
       [cftime.DatetimeGregorian(2000, 3, 1, 0, 0, 0, 0, has_year_zero=False),
        cftime.DatetimeGregorian(2000, 4, 1, 0, 0, 0, 0, has_year_zero=False)],
		...
      dtype=object)

@pochedls
Copy link
Collaborator

@tomvothecoder – this is great – thanks for pushing this forward so quickly.

I think add_missing_bounds will work in most cases, but will fail for seasonal averages (and definitely custom seasons).

I think we'll need to collect the bounds for each group, (e.g., group_bounds_array = [("2000-01-01 00:00", "2000-01-02 00:00"), ("2000-01-02 00:00", "2000-01-03 00:00"), ..., ("2000-01-31 00:00", "2000-02-01 00:00")] and then take the min of the lower bound and the max of the upper bound (i.e., group_bnd = [np.min(groups_bound_array[:, 0]), np.max(group_bounds_array[:, 1])].

@tomvothecoder
Copy link
Collaborator Author

I think we'll need to collect the bounds for each group, (e.g., group_bounds_array = [("2000-01-01 00:00", "2000-01-02 00:00"), ("2000-01-02 00:00", "2000-01-03 00:00"), ..., ("2000-01-31 00:00", "2000-02-01 00:00")] and then take the min of the lower bound and the max of the upper bound (i.e., group_bnd = [np.min(groups_bound_array[:, 0]), np.max(group_bounds_array[:, 1])]

This makes sense to me. I'll think of an algorithm.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

project: seats-year5 A SEATS goal for year 5. type: enhancement New enhancement request

Projects

Status: Next Up

Development

Successfully merging this pull request may close these issues.

[Feature]: Retain bounds and compute time point for group averaging operations

2 participants