Skip to content

BUG: Fix some PeriodIndex resampling issues #16153

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
ca7b6f2
CLN: move PeriodIndex binning code to TimeGrouper
winklerand Apr 26, 2017
c27f430
TST/CLN: raise error when resampling with on= or level= selection
winklerand Apr 26, 2017
390e16e
BUG: resampling PeriodIndex now returns PeriodIndex (GH 12884, 15944)
winklerand Apr 26, 2017
23566c2
BUG: OHLC-upsampling of PeriodIndex now returns DataFrame (GH 13083)
winklerand Apr 26, 2017
a82879d
BUG: enable resampling with NaT in PeriodIndex (GH 13224)
winklerand Apr 26, 2017
4b1c740
CLN: remove warning on falling back to tstamp resampling with loffset
winklerand Apr 30, 2017
73c0990
CLN: use memb._isnan for NaT masking
winklerand May 1, 2017
fa6c1d3
DOC: added issue reference for OHLC resampling
winklerand May 1, 2017
7ea04e9
STYLE: added blank lines
winklerand May 1, 2017
82a8275
TST: convert to parametrized tests / pytest idiom
winklerand May 6, 2017
432c623
CLN/TST: call assert_almost_equal() when comparing Series/DataFrames
winklerand May 6, 2017
c8814fb
STYLE: added blank lines, removed odd whitespace, fixed typo
winklerand May 13, 2017
486ad67
TST: add test case for multiple consecutive NaTs in PeriodIndex
winklerand May 13, 2017
ad8519f
TST/DOC: added issue number to test case
winklerand May 13, 2017
39fc7e2
TST: consolidate test_asfreq_downsample, test_asfreq_upsample -> test…
winklerand May 13, 2017
efcad5b
TST: set fixtures to default function scoping
winklerand May 13, 2017
41401d4
TST: convert constant 'setup-like' values/objects to pytest fixtures
winklerand May 13, 2017
398a684
DOC: whatsnew v0.21.0 entry (in API changes section)
winklerand May 21, 2017
8358c41
fixups
jreback Sep 28, 2017
6084e0c
moar whatsnew
jreback Sep 29, 2017
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
BUG: enable resampling with NaT in PeriodIndex (GH 13224)
  • Loading branch information
winklerand authored and jreback committed Sep 29, 2017
commit a82879d74f407bb605550e77244947b7588cde5b
32 changes: 28 additions & 4 deletions pandas/core/resample.py
Original file line number Diff line number Diff line change
Expand Up @@ -1270,18 +1270,34 @@ def _get_period_bins(self, ax):
raise TypeError('axis must be a PeriodIndex, but got '
'an instance of %r' % type(ax).__name__)

if not len(ax):
memb = ax.asfreq(self.freq, how=self.convention)
# NaT handling as in pandas._lib.lib.generate_bins_dt64()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blank line

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

nat_count = 0
if memb.hasnans:
import warnings
with warnings.catch_warnings():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't need this

memb._isnan already has this mask.

warnings.filterwarnings('ignore', 'numpy equal will not check '
'object identity')
nat_mask = memb.base == tslib.NaT
# raises "FutureWarning: numpy equal will not check object
# identity in the future. The comparison did not return the
# same result as suggested by the identity (`is`)) and will
# change."
nat_count = np.sum(nat_mask)
memb = memb[~nat_mask]

# if index contains no valid (non-NaT) values, return empty index
if not len(memb):
binner = labels = PeriodIndex(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could use _shallow_copy here, but this is OK

Copy link
Contributor

@jreback jreback Sep 29, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left this, ok for now.

data=[], freq=self.freq, name=ax.name)
return binner, [], labels

start = ax[0].asfreq(self.freq, how=self.convention)
end = ax[-1].asfreq(self.freq, how='end')
start = ax.min().asfreq(self.freq, how=self.convention)
end = ax.max().asfreq(self.freq, how='end')

labels = binner = PeriodIndex(start=start, end=end,
freq=self.freq, name=ax.name)

memb = ax.asfreq(self.freq, how=self.convention)
i8 = memb.asi8
freq_mult = self.freq.n
# when upsampling to subperiods, we need to generate enough bins
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blank line

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Expand All @@ -1291,6 +1307,14 @@ def _get_period_bins(self, ax):
rng += freq_mult
bins = memb.searchsorted(rng, side='left')

if nat_count > 0:
# NaT handling as in pandas._lib.lib.generate_bins_dt64()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this path tested sufficiently, e.g. 0, 1, 2 NaT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a test case for consecutive NaTs in the index (1cad7fa)

Should be sufficiently tested, cases covered:

  • 0 NaT: basically all other resampling tests
  • multiple single NaTs (at beginning, inside and end of index)
  • consecutive NaTs (at beginning, inside and end of index)

Any ideas for more exhaustive test cases?

# shift bins by the number of NaT
bins += nat_count
bins = np.insert(bins, 0, nat_count)
binner = binner.insert(0, tslib.NaT)
labels = labels.insert(0, tslib.NaT)

return binner, bins, labels


Expand Down
39 changes: 39 additions & 0 deletions pandas/tests/test_resample.py
Original file line number Diff line number Diff line change
Expand Up @@ -2913,6 +2913,45 @@ def test_upsampling_ohlc_freq_multiples(self):
result = s.resample('12H', kind='period').ohlc()
assert_frame_equal(result, expected)

def test_resample_with_nat(self):
# GH 13224
index = PeriodIndex([pd.NaT, '1970-01-01 00:00:00', pd.NaT,
'1970-01-01 00:00:01', '1970-01-01 00:00:02'],
freq='S')
frame = DataFrame([2, 3, 5, 7, 11], index=index)

index_1s = PeriodIndex(['1970-01-01 00:00:00', '1970-01-01 00:00:01',
'1970-01-01 00:00:02'], freq='S')
frame_1s = DataFrame([3, 7, 11], index=index_1s)
result_1s = frame.resample('1s').mean()
assert_frame_equal(result_1s, frame_1s)

index_2s = PeriodIndex(['1970-01-01 00:00:00',
'1970-01-01 00:00:02'], freq='2S')
frame_2s = DataFrame([5, 11], index=index_2s)
result_2s = frame.resample('2s').mean()
assert_frame_equal(result_2s, frame_2s)

index_3s = PeriodIndex(['1970-01-01 00:00:00'], freq='3S')
frame_3s = DataFrame([7], index=index_3s)
result_3s = frame.resample('3s').mean()
assert_frame_equal(result_3s, frame_3s)

pi = PeriodIndex(['1970-01-01 00:00:00', pd.NaT,
'1970-01-01 00:00:02'], freq='S')
frame = DataFrame([2, 3, 5], index=pi)
expected_index = period_range(pi[0], periods=len(pi), freq=pi.freq)
expected = DataFrame([2, np.NaN, 5], index=expected_index)
result = frame.resample('1s').mean()
assert_frame_equal(result, expected)

pi = PeriodIndex([pd.NaT] * 3, freq='S')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue number for this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done (not mentioned in the issue explicitly, is just an edge case)

frame = DataFrame([2, 3, 5], index=pi)
expected_index = PeriodIndex(data=[], freq=pi.freq)
expected = DataFrame([], index=expected_index)
result = frame.resample('1s').mean()
assert_frame_equal(result, expected)


class TestTimedeltaIndex(Base):
_index_factory = lambda x: timedelta_range
Expand Down