Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CFTimeIndex Resampling #2593

Merged
merged 72 commits into from
Feb 3, 2019
Merged
Show file tree
Hide file tree
Changes from 62 commits
Commits
Show all changes
72 commits
Select commit Hold shift + click to select a range
daa3a71
First implementation of resampling for CFTimeIndex.
jwenfai Nov 9, 2018
f9f3347
First implementation of resampling for CFTimeIndex, cleaned.
jwenfai Nov 9, 2018
0950505
First implementation of resampling for CFTimeIndex, cleaned.
jwenfai Nov 9, 2018
89f418a
First implementation of resampling for CFTimeIndex, cleaned.
jwenfai Nov 9, 2018
39c9d11
First implementation of resampling for CFTimeIndex.
jwenfai Nov 12, 2018
073b8e0
First implementation of resampling for CFTimeIndex,
jwenfai Nov 12, 2018
193c4c4
First implementation of resampling for CFTimeIndex, test file written.
jwenfai Nov 15, 2018
2c97738
First implementation of resampling for CFTimeIndex, test file written…
jwenfai Nov 15, 2018
9993ed9
First implementation of resampling for CFTimeIndex, test file written…
jwenfai Nov 15, 2018
f01745c
First implementation of resampling for CFTimeIndex, test file written…
jwenfai Nov 27, 2018
ffbf265
First implementation of resampling for CFTimeIndex, test file written…
jwenfai Dec 5, 2018
e64fedb
Merge pull request #1 from jwenfai/resample-v2-clean
jwenfai Dec 5, 2018
2850dd5
Docstrings for resample_cftime.py written. Upsample still not fixed.
jwenfai Dec 8, 2018
770b778
Fixed PEP8 and test parametrization.
jwenfai Dec 8, 2018
181e82c
PEP8
Zeitsperre Dec 12, 2018
5a41ee2
Merge pull request #3 from Ouranosinc/PEP8
Zeitsperre Dec 12, 2018
6b948c5
Test file fixes and other optimizations (2018-12-16 @spencerclark and…
jwenfai Dec 18, 2018
97c0948
Merge pull request #1 from Ouranosinc/master
jwenfai Dec 18, 2018
63d25ab
Merge remote-tracking branch 'origin/resample-v2-clean' into resample…
jwenfai Dec 18, 2018
05af869
Test file fixes and other optimizations (2018-12-16 @spencerclark and…
jwenfai Dec 18, 2018
85f1a84
Merge branch 'resample-v2-upsample' into resample-v2-clean
jwenfai Dec 18, 2018
2e8ced3
Merge pull request #4 from jwenfai/resample-v2-clean
Zeitsperre Dec 18, 2018
4317c69
Merge branch 'master' into master
jwenfai Dec 19, 2018
e7deeb2
Merge pull request #2 from Ouranosinc/master
jwenfai Jan 9, 2019
f9ac1a1
Merge pull request #3 from Ouranosinc/master
jwenfai Jan 9, 2019
8505ca9
_get_range_edges logic changed to emulate latest version of pandas.
jwenfai Jan 9, 2019
a495c2d
Simplified resampling logic (errors persist). Pre-cleaning.
jwenfai Jan 12, 2019
ad65ef0
Simplified resampling logic (error persists). Cleaned.
jwenfai Jan 12, 2019
5775e11
Simplified resampling logic (error persists). Fixed first_items.dropn…
jwenfai Jan 13, 2019
e1902fe
Simplified resampling logic (error persists). Logic slightly altered …
jwenfai Jan 16, 2019
f82500c
Simplified resampling logic (error persists). Logic slightly altered …
jwenfai Jan 16, 2019
80914e0
Merge pull request #5 from jwenfai/resample-v2-upsample
jwenfai Jan 16, 2019
1b3f41a
Simplified resampling logic (error persists). Cleaned. Merged with la…
jwenfai Jan 18, 2019
bc95f55
Precise cftime arithmetic. Reduced overall test time. Added test for …
jwenfai Jan 19, 2019
9fa4d51
Merge remote-tracking branch 'origin/master'
jwenfai Jan 19, 2019
5227480
Merge pull request #6 from jwenfai/master
jwenfai Jan 19, 2019
77bb2aa
Added default values for closed and label args of resample function i…
jwenfai Jan 19, 2019
be2e657
Merge pull request #7 from jwenfai/master
jwenfai Jan 19, 2019
a18161d
Added back replace['dayofwk'] = -1 to cftime_offsets.py and cftimeind…
jwenfai Jan 20, 2019
9582fbf
Merge pull request #8 from jwenfai/master
jwenfai Jan 20, 2019
5737546
Optimizations as per https://github.com/pydata/xarray/pull/2593/#pull…
jwenfai Jan 20, 2019
3268fc4
Merge pull request #9 from jwenfai/master
jwenfai Jan 20, 2019
6f38935
Simple test for non-standard calendars added and documentation updated.
jwenfai Jan 21, 2019
e7986c5
Simple test for non-standard calendars added and documentation updated.
jwenfai Jan 21, 2019
c64265f
Merge pull request #10 from jwenfai/master
jwenfai Jan 21, 2019
71f98db
Merge branch 'master' into master
jwenfai Jan 21, 2019
ec4e460
Added loffset support to CFTimeIndex resampling. Better adherence to …
jwenfai Jan 22, 2019
47b0eaa
Added loffset support to CFTimeIndex resampling. Better adherence to …
jwenfai Jan 22, 2019
f2ecaf6
Merge pull request #11 from jwenfai/master
jwenfai Jan 22, 2019
cd266c2
Support datetime.timedelta objects for loffset. Improved test coverage.
jwenfai Jan 22, 2019
41783cb
Merge pull request #12 from jwenfai/master
jwenfai Jan 22, 2019
5435910
Removed support for Python 2 compatibility.
jwenfai Jan 27, 2019
35b40fb
Merge pull request #13 from jwenfai/master
jwenfai Jan 27, 2019
814a04d
Updated pandas minversion to 0.24 as 0.24 is officially out.
jwenfai Jan 27, 2019
2a9402e
Merge branch 'pydata-master'
jwenfai Jan 27, 2019
505a0fa
Removed Python 2 support from test_cftimeindex_resample.py.
jwenfai Jan 27, 2019
9fbb016
Merge branch 'master' into master
jwenfai Jan 27, 2019
0820c3b
Merge pull request #14 from jwenfai/master
jwenfai Jan 27, 2019
89bc708
Moved full_index and first_items generation logic to a helper functio…
jwenfai Jan 29, 2019
31ccebf
Merge remote-tracking branch 'origin/master'
jwenfai Jan 29, 2019
8ac6f76
Merge pull request #15 from jwenfai/master
jwenfai Jan 29, 2019
8dbee52
Merge branch 'master' into master
shoyer Feb 1, 2019
afad30d
In groupby.py, moved s to _get_index_and_items helper function.
jwenfai Feb 1, 2019
1381dab
Removed redundant code from test_formatting.py due to bad merge.
jwenfai Feb 2, 2019
6074548
Merge pull request #16 from jwenfai/master
jwenfai Feb 2, 2019
1010264
Merge branch 'master' into master
jwenfai Feb 2, 2019
6c4b609
Merge branch 'pydata-master'
jwenfai Feb 2, 2019
59f1f94
Removed redundant test and simplify code now that dropna is implemented.
jwenfai Feb 2, 2019
db62a96
Merge branch 'master' into master
jwenfai Feb 2, 2019
6edb45a
Merge pull request #17 from jwenfai/master
jwenfai Feb 2, 2019
f7f2c38
delete unnecessary test
shoyer Feb 2, 2019
ef68960
eliminate some repetition
shoyer Feb 2, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 11 additions & 9 deletions doc/time-series.rst
Original file line number Diff line number Diff line change
Expand Up @@ -309,31 +309,34 @@ For data indexed by a :py:class:`~xarray.CFTimeIndex` xarray currently supports:

da.differentiate('time')

- And serialization:
- Serialization:

.. ipython:: python

da.to_netcdf('example-no-leap.nc')
xr.open_dataset('example-no-leap.nc')

- And resampling along the time dimension for data indexed by a :py:class:`~xarray.CFTimeIndex`:

.. ipython:: python

da.resample(time='81T', closed='right', label='right', base=3).mean()

.. note::

While much of the time series functionality that is possible for standard
dates has been implemented for dates from non-standard calendars, there are
still some remaining important features that have yet to be implemented,
for example:

- Resampling along the time dimension for data indexed by a
:py:class:`~xarray.CFTimeIndex` (:issue:`2191`, :issue:`2458`)
- Built-in plotting of data with :py:class:`cftime.datetime` coordinate axes
(:issue:`2164`).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The example below is not super-relevant now. Maybe delete:

(e.g. to allow the use of some forms of resample with non-standard calendars).

as well as the line that calls resample at the end of the code block?

For some use-cases it may still be useful to convert from
a :py:class:`~xarray.CFTimeIndex` to a :py:class:`pandas.DatetimeIndex`,
despite the difference in calendar types (e.g. to allow the use of some
forms of resample with non-standard calendars). The recommended way of
doing this is to use the built-in
:py:meth:`~xarray.CFTimeIndex.to_datetimeindex` method:
despite the difference in calendar types. The recommended way of doing this
is to use the built-in :py:meth:`~xarray.CFTimeIndex.to_datetimeindex`
method:

.. ipython:: python
:okwarning:
Expand All @@ -343,8 +346,7 @@ For data indexed by a :py:class:`~xarray.CFTimeIndex` xarray currently supports:
da
datetimeindex = da.indexes['time'].to_datetimeindex()
da['time'] = datetimeindex
da.resample(time='Y').mean('time')


However in this case one should use caution to only perform operations which
do not depend on differences between dates (e.g. differentiation,
interpolation, or upsampling with resample), as these could introduce subtle
Expand Down
5 changes: 4 additions & 1 deletion doc/whats-new.rst
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,10 @@ Enhancements
report showing what exactly differs between the two objects (dimensions /
coordinates / variables / attributes) (:issue:`1507`).
By `Benoit Bovy <https://github.com/benbovy>`_.
- Resampling of standard and non-standard calendars indexed by
:py:class:`~xarray.CFTimeIndex` is now possible. (:issue:`2191`).
By `Jwen Fai Low <https://github.com/jwenfai>`_ and
`Spencer Clark <https://github.com/spencerkclark>`_.
- Add ``tolerance`` option to ``resample()`` methods ``bfill``, ``pad``,
``nearest``. (:issue:`2695`)
By `Hauke Schulz <https://github.com/observingClouds>`_.
Expand All @@ -57,7 +61,6 @@ Enhancements
(:issue:`1332`)
By `Keisuke Fujii <https://github.com/fujiisoup>`_.


Bug fixes
~~~~~~~~~

Expand Down
25 changes: 21 additions & 4 deletions xarray/coding/cftime_offsets.py
Original file line number Diff line number Diff line change
Expand Up @@ -358,29 +358,41 @@ def rollback(self, date):
class Day(BaseCFTimeOffset):
_freq = 'D'

def as_timedelta(self):
return timedelta(days=self.n)

def __apply__(self, other):
return other + timedelta(days=self.n)
return other + self.as_timedelta()


class Hour(BaseCFTimeOffset):
_freq = 'H'

def as_timedelta(self):
return timedelta(hours=self.n)

def __apply__(self, other):
return other + timedelta(hours=self.n)
return other + self.as_timedelta()


class Minute(BaseCFTimeOffset):
_freq = 'T'

def as_timedelta(self):
return timedelta(minutes=self.n)

def __apply__(self, other):
return other + timedelta(minutes=self.n)
return other + self.as_timedelta()


class Second(BaseCFTimeOffset):
_freq = 'S'

def as_timedelta(self):
return timedelta(seconds=self.n)

def __apply__(self, other):
return other + timedelta(seconds=self.n)
return other + self.as_timedelta()


_FREQUENCIES = {
Expand Down Expand Up @@ -427,6 +439,11 @@ def __apply__(self, other):
_FREQUENCY_CONDITION)


# pandas defines these offsets as "Tick" objects, which for instance have
# distinct behavior from monthly or longer frequencies in resample.
CFTIME_TICKS = (Day, Hour, Minute, Second)


def to_offset(freq):
"""Convert a frequency string to the appropriate subclass of
BaseCFTimeOffset."""
Expand Down
23 changes: 8 additions & 15 deletions xarray/core/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -756,23 +756,16 @@ def resample(self, indexer=None, skipna=None, closed=None, label=None,
dim_coord = self[dim]

if isinstance(self.indexes[dim_name], CFTimeIndex):
raise NotImplementedError(
'Resample is currently not supported along a dimension '
'indexed by a CFTimeIndex. For certain kinds of downsampling '
'it may be possible to work around this by converting your '
'time index to a DatetimeIndex using '
'CFTimeIndex.to_datetimeindex. Use caution when doing this '
'however, because switching to a DatetimeIndex from a '
'CFTimeIndex with a non-standard calendar entails a change '
'in the calendar type, which could lead to subtle and silent '
'errors.'
)

from .resample_cftime import CFTimeGrouper
grouper = CFTimeGrouper(freq, closed, label, base, loffset)
else:
# TODO: to_offset() call required for pandas==0.19.2
grouper = pd.Grouper(freq=freq, closed=closed, label=label,
base=base,
loffset=pd.tseries.frequencies.to_offset(
loffset))
group = DataArray(dim_coord, coords=dim_coord.coords,
dims=dim_coord.dims, name=RESAMPLE_DIM)
# TODO: to_offset() call required for pandas==0.19.2
grouper = pd.Grouper(freq=freq, closed=closed, label=label, base=base,
loffset=pd.tseries.frequencies.to_offset(loffset))
resampler = self._resample_cls(self, group=group, dim=dim_name,
grouper=grouper,
resample_dim=RESAMPLE_DIM)
Expand Down
27 changes: 22 additions & 5 deletions xarray/core/groupby.py
Original file line number Diff line number Diff line change
Expand Up @@ -259,11 +259,8 @@ def __init__(self, obj, group, squeeze=False, grouper=None, bins=None,
# TODO: sort instead of raising an error
raise ValueError('index must be monotonic for resampling')
s = pd.Series(np.arange(index.size), index)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you make this object s inside the helper function instead? It's not needed outside here

first_items = s.groupby(grouper).first()
_apply_loffset(grouper, first_items)
full_index = first_items.index
if first_items.isnull().any():
first_items = first_items.dropna()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice if dropna() worked for a Series indexed by a CFTimeIndex. I'll see if I can sort that out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that dropna() for CFTimeIndex indices are working (#2734), the logic here will be simplified once #2734 is merged with master.

full_index, first_items = self._get_index_and_items(
index, s, grouper)
sbins = first_items.values.astype(np.int64)
group_indices = ([slice(i, j)
for i, j in zip(sbins[:-1], sbins[1:])] +
Expand Down Expand Up @@ -310,6 +307,26 @@ def __len__(self):
def __iter__(self):
return zip(self._unique_coord.values, self._iter_grouped())

def _get_index_and_items(self, index, s, grouper):
from .resample_cftime import CFTimeGrouper
if isinstance(grouper, CFTimeGrouper):
first_items = grouper.first_items(index)
full_index = first_items.index
if first_items.isnull().any():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you merge in master against, you could switch this block back to using Series.dropna().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

index_dict = dict(zip(np.arange(first_items.size),
first_items.index.values))
first_items.index = np.arange(first_items.size)
first_items = first_items.dropna()
first_items.index = [index_dict[i] for i in
first_items.index.values]
else:
first_items = s.groupby(grouper).first()
_apply_loffset(grouper, first_items)
full_index = first_items.index
if first_items.isnull().any():
first_items = first_items.dropna()
return full_index, first_items

def _iter_grouped(self):
"""Iterate over each element in this group"""
for indices in self._group_indices:
Expand Down
Loading