Skip to content

Fix DTI comparison with None, datetime.date #19301

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Feb 2, 2018
Merged
Changes from 1 commit
Commits
Show all changes
17 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
rename test_dti_cmp_invalid
  • Loading branch information
jbrockmendel committed Jan 23, 2018
commit 39ad04d9e49dce9c334afdf0729571845d04db6c
2 changes: 1 addition & 1 deletion pandas/tests/indexes/datetimes/test_arithmetic.py
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ class TestDatetimeIndexComparisons(object):
@pytest.mark.parametrize('other', [None,
datetime(2016, 1, 1).date(),
np.nan])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add pd.NaT here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, pd.NaT behaves differently from None or np.nan.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

huh? that would be troubling, why is that

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The __eq__ and __ne__ behave the same, but None and np.nan raise TypeError for the inequalities, whereas pd.NaT returns False.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right. ok then split the tests to reflect this, testing all nulls at once for eq/ne make the test easy to grok.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make the test easy to grok.

I'll do this, but object to the notions that fixtures and easy-to-grok can go in the same file. If I can't import the test and run it in the interpreter, then I'm in a Strange Land.

def test_dti_cmp_invalid(self, tz, other):
def test_dti_cmp_non_datetime(self, tz, other):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dti_cmp_null_scalar

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

# GH#19301
dti = pd.date_range('2016-01-01', periods=2, tz=tz)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So is this saying that

DatetimeIndex([None, '2016-01-01']) == [None, datetime.date(2016, 1, 1)])

is [False, False]? I thought that in #18188 we decided that DatetimeIndex compared with datetime.date would coerce the date to a datetime at midnight?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TomAugspurger I think part of the confusion is over Timestamp comparison vs DatetimeIndex comparison vs Series[datetime64] comparison (e.g. the whatsnew note in #18188 talks about Series (but puts the tests in index tests...)). This PR and a bunch of other recent ones have been focused on making the Index/Series behavior more consistent.

Following this, #19288 can be de-kludged to make Series[datetime64] comparison dispatch to DatetimeIndex comparison, ensuring internal consistency. But you're right that this would mean a change in the behavior of Series[datetime64] comparisons.

For the moment I'm taking Timestamp behavior as canonical and making DatetimeIndex match that.

ts = pd.Timestamp('2016-01-01')
>>> ts == ts.to_pydatetime().date()
False
>>> ts < ts.to_pydatetime().date()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "pandas/_libs/tslib.pyx", line 1165, in pandas._libs.tslib._Timestamp.__richcmp__
TypeError: Cannot compare type 'Timestamp' with type 'date'

I recall a discussion of whether date should be treated as datetime-at-midnight for Timestamp comparison purposes, my thought being it should be treated as Period(..., freq='D').

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In [1]: ts = pd.Timestamp('2016-01-01')

In [2]: ts
Out[2]: Timestamp('2016-01-01 00:00:00')

In [3]: ts.date()
Out[3]: datetime.date(2016, 1, 1)

In [4]: ts.to_pydatetime()
Out[4]: datetime.datetime(2016, 1, 1, 0, 0)

In [5]: ts.to_pydatetime() == ts.date()
Out[5]: False

In [6]: ts.to_pydatetime().date() == ts.date()
Out[6]: True

I find [5] a bit odd.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find [5] a bit odd.

The behavior is analogous to comparing Timestamp vs Period. eq and ne return False and True, respectively, and all others raise TypeError. Its odd if you interpret date as "datetime implicitly at midnight", but pretty intuitive if you interpret it as "less specific than a timestamp"


Expand Down