-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CFTimeIndex Resampling #2593
CFTimeIndex Resampling #2593
Changes from 14 commits
daa3a71
f9f3347
0950505
89f418a
39c9d11
073b8e0
193c4c4
2c97738
9993ed9
f01745c
ffbf265
e64fedb
2850dd5
770b778
181e82c
5a41ee2
6b948c5
97c0948
63d25ab
05af869
85f1a84
2e8ced3
4317c69
e7deeb2
f9ac1a1
8505ca9
a495c2d
ad65ef0
5775e11
e1902fe
f82500c
80914e0
1b3f41a
bc95f55
9fa4d51
5227480
77bb2aa
be2e657
a18161d
9582fbf
5737546
3268fc4
6f38935
e7986c5
c64265f
71f98db
ec4e460
47b0eaa
f2ecaf6
cd266c2
41783cb
5435910
35b40fb
814a04d
2a9402e
505a0fa
9fbb016
0820c3b
89bc708
31ccebf
8ac6f76
8dbee52
afad30d
1381dab
6074548
1010264
6c4b609
59f1f94
db62a96
6edb45a
f7f2c38
ef68960
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -234,7 +234,29 @@ def __init__(self, obj, group, squeeze=False, grouper=None, bins=None, | |
# TODO: sort instead of raising an error | ||
raise ValueError('index must be monotonic for resampling') | ||
s = pd.Series(np.arange(index.size), index) | ||
first_items = s.groupby(grouper).first() | ||
if isinstance(grouper, tuple): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I've been thinking about this logic over the last few days. With some minor modifications to |
||
if grouper[0] == 'downsampling': | ||
# if we're downsampling CFTimeIndex, do this: | ||
labels = grouper[1] | ||
binner = grouper[2] | ||
first_items = s.groupby(binner).first().reindex(labels) | ||
# reindex(grouper[1]) adds empty np.nan bins to | ||
# emulate pandas behavior | ||
elif grouper[0] == 'upsampling': | ||
# if we're upsampling CFTimeIndex, do this: | ||
labels = grouper[1] | ||
binner = grouper[2] | ||
closed = grouper[3] | ||
if closed == 'right': | ||
first_items = s.reindex(pd.Index(binner), | ||
method='nearest') | ||
first_items.index = labels | ||
else: | ||
first_items = s.reindex(pd.Index(binner), | ||
method='bfill') | ||
first_items.index = labels | ||
else: | ||
first_items = s.groupby(grouper).first() | ||
full_index = first_items.index | ||
if first_items.isnull().any(): | ||
first_items = first_items.dropna() | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It would be nice if There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -194,14 +194,35 @@ def _interpolate(self, kind='linear'): | |
.format(self._obj.data.name) | ||
) | ||
|
||
x = self._obj[self._dim].astype('float') | ||
# from ..coding.cftimeindex import CFTimeIndex | ||
import cftime as cf | ||
import numpy as np | ||
if isinstance(self._obj[self._dim].values[0], cf.datetime): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think something like |
||
t = self._obj[self._dim] | ||
x = np.insert([td.total_seconds() for td in | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think you can just make use of x = datetime_to_numeric(t, datetime_unit='s') |
||
t[1:].values - t[:-1].values], 0, 0).cumsum() | ||
# calling total_seconds is potentially bad for performance | ||
x = x.round() | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. One other minor thing -- I'm inclined not to round here. I realize that in the climate community, with non-standard calendars, it would be pretty unusual to encounter dates with a time resolution finer than one second. However in xarray my sense is we should avoid making assumptions about this. Fortunately with recent updates in cftime, this is becoming less and less of an issue. See the difference in behavior here between cftime 1.0.0 and cftime 1.0.3.4 (the latest version): cftime 1.0.0Here differences in datetime arithmetic precision lead to small differences in the interpolation results between data indexed by a CFTimeIndex and data indexed by a DatetimeIndex.
Note that we can use
cftime 1.0.3.4For this example at least, the issues with datetime arithmetic precision have been resolved.
Any errors in interpolation due to microsecond noise in the ordinal coordinate would be expected to be small (I think we could account for this in the tests by using |
||
# Rounding fixes erroneous microsecond offsets in timedelta | ||
# (fault of CFTime), but destroys microsecond resolution data | ||
else: | ||
x = self._obj[self._dim].astype('float') | ||
y = self._obj.data | ||
|
||
axis = self._obj.get_axis_num(self._dim) | ||
|
||
f = interp1d(x, y, kind=kind, axis=axis, bounds_error=True, | ||
assume_sorted=True) | ||
new_x = self._full_index.values.astype('float') | ||
if isinstance(self._full_index.values[0], cf.datetime): | ||
t = self._full_index | ||
new_x = np.insert([td.total_seconds() for td in | ||
t[1:].values - t[:-1].values], 0, 0).cumsum() | ||
# calling total_seconds is potentially bad for performance | ||
new_x = new_x.round() | ||
# Rounding fixes erroneous microsecond offsets in timedelta | ||
# (fault of CFTime), but destroys microsecond resolution data | ||
else: | ||
new_x = self._full_index.values.astype('float') | ||
|
||
# construct new up-sampled DataArray | ||
dummy = self._obj.copy() | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,151 @@ | ||
""" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think following what we've done in similar situations, we might want to add a copy of the pandas license at the top of this file (see here, for example, following #2301 (comment)). |
||
CFTimeIndex port of pandas resampling | ||
(pandas/pandas/core/resample.py) | ||
Does not support non-integer freq | ||
""" | ||
from __future__ import absolute_import, division, print_function | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Another small thing -- since xarray is no longer worried about Python 2 compatibility, we can remove these imports (xref: #2645). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Removed the import statements. Same failures as before. |
||
|
||
import datetime | ||
from ..coding.cftimeindex import CFTimeIndex | ||
from ..coding.cftime_offsets import (cftime_range, normalize_date, | ||
Day, Hour, Minute, Second) | ||
|
||
|
||
def _get_time_bins(index, freq, closed, label, base): | ||
# This portion of code comes from TimeGrouper __init__ # | ||
end_types = {'M', 'A'} | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Perhaps it might be cleaner to separate some of this logic into another method. For example define: def _default_closed_or_label(freq):
if freq._freq in {'M', 'A'}:
return 'right'
else:
return 'left' Then within if closed is None:
closed = _default_closed_or_label(freq)
if label is None:
label = _default_closed_or_label(freq) |
||
if freq._freq in end_types: | ||
if closed is None: | ||
closed = 'right' | ||
if label is None: | ||
label = 'right' | ||
else: | ||
if closed is None: | ||
closed = 'left' | ||
if label is None: | ||
label = 'left' | ||
# This portion of code comes from TimeGrouper __init__ # | ||
|
||
if not isinstance(index, CFTimeIndex): | ||
raise TypeError('index must be a CFTimeIndex, but got ' | ||
'an instance of %r' % type(index).__name__) | ||
if len(index) == 0: | ||
binner = labels = CFTimeIndex(data=[], name=index.name) | ||
return binner, [], labels | ||
|
||
first, last = _get_range_edges(index.min(), index.max(), freq, | ||
closed=closed, | ||
base=base) | ||
binner = labels = cftime_range(freq=freq, | ||
start=first, | ||
end=last, | ||
name=index.name) | ||
|
||
if len(binner) > 1 and binner[-1] < last: | ||
extra_date_range = cftime_range(binner[-1], last + freq, | ||
freq=freq, name=index.name) | ||
binner = labels = CFTimeIndex(binner.append(extra_date_range[1:])) | ||
|
||
trimmed = False | ||
if len(binner) > 2 and binner[-2] == last and closed == 'right': | ||
binner = binner[:-1] | ||
trimmed = True | ||
|
||
if closed == 'right': | ||
spencerkclark marked this conversation as resolved.
Show resolved
Hide resolved
|
||
labels = binner | ||
if label == 'right': | ||
labels = labels[1:] | ||
elif not trimmed: | ||
labels = labels[:-1] | ||
else: | ||
if label == 'right': | ||
labels = labels[1:] | ||
elif not trimmed: | ||
labels = labels[:-1] | ||
return binner, labels | ||
|
||
|
||
def _adjust_bin_edges(binner, ax_values, freq): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This method does not appear to be used in this implementation. Should it be? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, I also noticed I've been too slavishly copying pandas logic. This and some other unnecessary code have been/will be removed. |
||
# Some hacks for > daily data, see #1471, #1458, #1483 | ||
if freq._freq not in ['D', 'H', 'T', 'min', 'S']: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It might be worth adding a # pandas defines these offsets as "Tick" objects, which for instance have
# distinct behavior from monthly or longer frequencies in resample.
CFTIME_TICKS = (Day, Hour, Minute, Second) Then here and in other methods this check could just be something like |
||
# intraday values on last day | ||
if binner[-2] > ax_values.max(): | ||
binner = binner[:-1] | ||
return binner | ||
|
||
|
||
def _get_range_edges(first, last, offset, closed='left', base=0): | ||
if offset._freq in ['D', 'H', 'T', 'min', 'S']: | ||
is_day = isinstance(offset, Day) | ||
if (is_day and offset.n == 1) or not is_day: | ||
return _adjust_dates_anchored(first, last, offset, | ||
closed=closed, base=base) | ||
else: | ||
first = normalize_date(first) | ||
last = normalize_date(last) | ||
|
||
if closed == 'left': | ||
first = offset.rollback(first) | ||
else: | ||
first = first - offset | ||
|
||
last = last + offset | ||
return first, last | ||
|
||
|
||
def _adjust_dates_anchored(first, last, offset, closed='right', base=0): | ||
base = base % offset.n | ||
start_day = normalize_date(first) | ||
base_td = datetime.timedelta(0) | ||
if offset._freq == 'D': | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If we defined an base_td = type(offset)(n=base).as_timedelta() |
||
base_td = datetime.timedelta(days=base) | ||
elif offset._freq == 'H': | ||
base_td = datetime.timedelta(hours=base) | ||
elif offset._freq in ['T', 'min']: | ||
base_td = datetime.timedelta(minutes=base) | ||
elif offset._freq == 'S': | ||
base_td = datetime.timedelta(seconds=base) | ||
offset_td = _offset_timedelta(offset) | ||
start_day += base_td | ||
foffset = (first - start_day) % offset_td | ||
loffset = (last - start_day) % offset_td | ||
if closed == 'right': | ||
if foffset.total_seconds() > 0: | ||
fresult = first - foffset | ||
else: | ||
fresult = first - offset_td | ||
|
||
if loffset.total_seconds() > 0: | ||
lresult = last + (offset_td - loffset) | ||
else: | ||
lresult = last | ||
else: | ||
if foffset.total_seconds() > 0: | ||
fresult = first - foffset | ||
else: | ||
fresult = first | ||
|
||
if loffset.total_seconds() > 0: | ||
lresult = last + (offset_td - loffset) | ||
else: | ||
lresult = last + offset_td | ||
return fresult, lresult | ||
|
||
|
||
def _offset_timedelta(offset): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It might make sense to add an class Day(BaseCFTimeOffset):
_freq = 'D'
def as_timedelta(self):
return timedelta(days=self.n)
def __apply__(self, other):
return other + self.as_timedelta() That way we would not need this method to do the translation. |
||
if isinstance(offset, Day): | ||
return datetime.timedelta(days=offset.n) | ||
elif isinstance(offset, Hour): | ||
return datetime.timedelta(hours=offset.n) | ||
elif isinstance(offset, Minute): | ||
return datetime.timedelta(minutes=offset.n) | ||
elif isinstance(offset, Second): | ||
return datetime.timedelta(seconds=offset.n) | ||
|
||
|
||
def _adjust_binner_for_upsample(binner, closed): | ||
spencerkclark marked this conversation as resolved.
Show resolved
Hide resolved
|
||
if closed == 'right': | ||
binner = binner[1:] | ||
else: | ||
binner = binner[:-1] | ||
return binner |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -14,7 +14,7 @@ | |
import numpy as np | ||
import pandas as pd | ||
|
||
from xarray.core.alignment import align | ||
# from xarray.core.alignment import align | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Was there a reason for commenting out this line? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't recall working on the plot.py file. It seems to be an unused import statement so that might've been why it's been commented out. |
||
from xarray.core.common import contains_cftime_datetimes | ||
from xarray.core.pycompat import basestring | ||
|
||
|
@@ -255,7 +255,6 @@ def _infer_line_data(darray, x, y, hue): | |
huelabel = label_from_attrs(darray[huename]) | ||
hueplt = darray[huename] | ||
|
||
|
||
xlabel = label_from_attrs(xplt) | ||
ylabel = label_from_attrs(yplt) | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you make this object
s
inside the helper function instead? It's not needed outside here