-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CFTimeIndex Resampling #2593
CFTimeIndex Resampling #2593
Conversation
more bugs fixed, cleaned.
Resample v2 clean
Hello @jwenfai! Thanks for updating the PR. Cheers ! There are no PEP8 issues in this Pull Request. 🍻 Comment last updated on February 02, 2019 at 20:46 Hours UTC |
list(itertools.product( | ||
['left', 'right'], | ||
['left', 'right'], | ||
['2MS', '2M', '3MS', '3M', '7MS', '7M']))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Brief note on the test approach - overall looks great - on these items, you can put each set of params in its own parametrize
decorator, and avoid making the products yourself
(no need to change this time though)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the comment, I made the change, much better than the torturous code I was writing before!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jwenfai I'm excited to see some progress on this! I may be a bit slow these next few days, but I'll try to provide some more extensive feedback soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jwenfai thanks for your patience; I was away at a conference all of last week. Here are some initial comments on this PR, which is a very good start.
I'll probably need a few more passes on this, particularly to understand the issues with the upsampling portion.
xarray/core/resample_cftime.py
Outdated
|
||
def _get_time_bins(index, freq, closed, label, base): | ||
# This portion of code comes from TimeGrouper __init__ # | ||
end_types = {'M', 'A'} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps it might be cleaner to separate some of this logic into another method. For example define:
def _default_closed_or_label(freq):
if freq._freq in {'M', 'A'}:
return 'right'
else:
return 'left'
Then within _get_time_bins
you could use something like this:
if closed is None:
closed = _default_closed_or_label(freq)
if label is None:
label = _default_closed_or_label(freq)
xarray/core/resample_cftime.py
Outdated
|
||
def _adjust_bin_edges(binner, ax_values, freq): | ||
# Some hacks for > daily data, see #1471, #1458, #1483 | ||
if freq._freq not in ['D', 'H', 'T', 'min', 'S']: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be worth adding a CFTIME_TICKS
variable in cftime_offsets.py
that we could import and use for these instance checks.
# pandas defines these offsets as "Tick" objects, which for instance have
# distinct behavior from monthly or longer frequencies in resample.
CFTIME_TICKS = (Day, Hour, Minute, Second)
Then here and in other methods this check could just be something like if not isinstance(freq, CFTIME_TICKS)
or if isinstance(freq, CFTIME_TICKS)
.
xarray/core/resample_cftime.py
Outdated
return fresult, lresult | ||
|
||
|
||
def _offset_timedelta(offset): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might make sense to add an as_timedelta
method (where applicable) to the cftime offset objects. For example:
class Day(BaseCFTimeOffset):
_freq = 'D'
def as_timedelta(self):
return timedelta(days=self.n)
def __apply__(self, other):
return other + self.as_timedelta()
That way we would not need this method to do the translation.
xarray/core/resample_cftime.py
Outdated
base = base % offset.n | ||
start_day = normalize_date(first) | ||
base_td = datetime.timedelta(0) | ||
if offset._freq == 'D': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we defined an as_timedelta
method for cftime offsets (as described below), we could instead replace this conditional block with one line:
base_td = type(offset)(n=base).as_timedelta()
xarray/core/resample_cftime.py
Outdated
return binner, labels | ||
|
||
|
||
def _adjust_bin_edges(binner, ax_values, freq): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method does not appear to be used in this implementation. Should it be?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I also noticed I've been too slavishly copying pandas logic. This and some other unnecessary code have been/will be removed.
['left', 'right'], | ||
['2MS', '2M', '3MS', '3M', '7MS', '7M']))) | ||
def test_downsampler(closed, label, freq): | ||
downsamp_series = series(pd_index()).resample( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like things might be more apples to apples in this test if we compared the results of resampling a DataArray indexed using a DatetimeIndex to the results of resampling a DataArray indexed using a CFTimeIndex (with a standard calendar type).
This would also allow us to make use of xarray's testing methods, like xarray.testing.assert_equal
, which checks the equivalence of two DataArrays (including the equivalence of their coordinates, and NaN placement).
xarray/core/resample.py
Outdated
# from ..coding.cftimeindex import CFTimeIndex | ||
import cftime as cf | ||
import numpy as np | ||
if isinstance(self._obj[self._dim].values[0], cf.datetime): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think something like if isinstance(self._obj.indexes[self._dim], CFTimeIndex)
might be cleaner here.
xarray/core/resample.py
Outdated
import numpy as np | ||
if isinstance(self._obj[self._dim].values[0], cf.datetime): | ||
t = self._obj[self._dim] | ||
x = np.insert([td.total_seconds() for td in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can just make use of xarray.core.utils.datetime_to_numeric
here, e.g.
x = datetime_to_numeric(t, datetime_unit='s')
def test_downsampler(closed, label, freq): | ||
downsamp_series = series(pd_index()).resample( | ||
freq, closed=closed, label=label).mean().dropna() | ||
downsamp_da = da(xr_index()).resample( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pytest fixtures are typically provided as arguments to the test functions, rather than called directly.
|
||
@pytest.fixture() | ||
def xr_index(): | ||
return xr.cftime_range('2000-01-01', periods=30, freq='MS', tz='UTC') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cftime_range
does not support a tz
option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll remove that option but someone needs to fix this misleading doc
http://xarray.pydata.org/en/stable/generated/xarray.cftime_range.html
xarray.cftime_range(start=None, end=None, periods=None, freq='D', tz=None, normalize=False, name=None, closed=None, calendar='standard')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry about that! Indeed that signature is misleading; see #2613 for a fix.
@spencerkclark Thanks for the detailed review! I'll fix up my code over the next few days. I haven't completely solved the upsampling issue yet but I think I might have some clues as to what's happening. Timedelta operations on cftime.datetime does not always return correct values. Sometimes, they are a few microseconds or one second off. The issue can be sidestepped by shifting the the bins 1 second forward for |
… 2018-12-05 @max-sixty GitHub reviews for resample-v2-clean pull request). Not cleaned.
Get PEP8 changes from Ouranosinc.
…-v2-clean # Conflicts: # xarray/tests/test_cftimeindex_resample.py
… 2018-12-05 @max-sixty GitHub reviews for resample-v2-clean pull request). Cleaned.
# Conflicts: # xarray/core/resample.py # xarray/tests/test_cftimeindex_resample.py
Resample v2 clean
@jwenfai thanks for the updates. It looks like there are some merge conflicts that are preventing our CI from running. Could you please resolve those when you get chance, so we can see those results? |
Moved full_index and first_items generation logic to a helper function
Some connection error. I restarted Travis, let's see if this happens again. |
It looks like tests are passing now. I'm going to give this another look over and then (probably) merge |
xarray/core/groupby.py
Outdated
@@ -259,11 +259,8 @@ def __init__(self, obj, group, squeeze=False, grouper=None, bins=None, | |||
# TODO: sort instead of raising an error | |||
raise ValueError('index must be monotonic for resampling') | |||
s = pd.Series(np.arange(index.size), index) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you make this object s
inside the helper function instead? It's not needed outside here
xarray/tests/test_formatting.py
Outdated
@@ -1,6 +1,8 @@ | |||
# -*- coding: utf-8 -*- | |||
from textwrap import dedent | |||
|
|||
from textwrap import dedent |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like changing from another PR have leaked in here? Let's try to figure that out...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems to have accidentally crept in when I merged changes from pydata/master into my local repo 4~5 days back. Here's what I managed to trace (from latest to earliest instance of test_formatting.py being changed):
Ouranosinc#14
jwenfai@31ccebf
jwenfai@9fbb016
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like it’s just a bad merge — these tests are now duplicated twice. You can simply delete the redundant code and push a new commit.
xarray/tests/test_formatting.py
Outdated
@@ -189,6 +189,53 @@ def test_attribute_repr(self): | |||
assert '\n' not in newlines | |||
assert '\t' not in tabs | |||
|
|||
def test_diff_dataset_repr(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You still need to delete this repeated method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry about that, didn't know how I missed it.
xarray/core/groupby.py
Outdated
if isinstance(grouper, CFTimeGrouper): | ||
first_items = grouper.first_items(index) | ||
full_index = first_items.index | ||
if first_items.isnull().any(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you merge in master against, you could switch this block back to using Series.dropna()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Removed redundant test and simplify code now that dropna is implemented.
I'm going to merge this when tests pass |
Sounds good @shoyer, thanks for bringing this to the finish line. |
All tests passed. Thanks, @spencerkclark and @shoyer, for all the help! |
thanks @jwenfai and @spencerkclark ! |
* master: remove xfail from test_cross_engine_read_write_netcdf4 (pydata#2741) Reenable cross engine read write netCDF test (pydata#2739) remove bottleneck dev build from travis, this test env was failing to build (pydata#2736) CFTimeIndex Resampling (pydata#2593) add tests for handling of empty pandas objects in constructors (pydata#2735) dropna() for a Series indexed by a CFTimeIndex (pydata#2734) deprecate compat & encoding (pydata#2703) Implement integrate (pydata#2653) ENH: resample methods with tolerance (pydata#2716) improve error message for invalid encoding (pydata#2730) silence a couple of warnings (pydata#2727)
I would appreciate some feedback on why this implementation for CFTimeIndex resampling doesn't match pandas' output 100%.
test_cftimeindex_resample.py
) created for standard calendars but not for non-standard calendars (360 days etc.). Contents fromtest_cftimeindex_resample.py
will be inserted intotest_dataarray.py
andtest_cftimeindex_resample.py
will be deleted once resampling implementation is finalized. Files found in thetests/temp
folder is meant to highlight resampling discrepancies between pandas and this implementaion. Both the files and the folder will be removed once resampling implementation is finalized.resample_cftime.py
. There were no doctrings from https://github.com/pandas-dev/pandas/blob/master/pandas/core/resample.py (which is where the codes were ported from) that could be conveniently copied.