-
-
Notifications
You must be signed in to change notification settings - Fork 19.3k
BUG: nonexistent Timestamp pre-summer/winter DST w/dateutil timezone #31155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
jreback
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. ping on green.
cee72ed to
c9a87bd
Compare
| assert result == expected | ||
|
|
||
|
|
||
| def test_constructor_before_dst_switch(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we test the example in the issue here? (i.e. that Timestamp(Timestamp(epoch_time, tz=..)) doesn't change value)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mroeschke Sure!
But the value breaks only 1 nanosecond before the switch. Currently, I'm struggling to make everything else work on every azure pipeline (see comment below).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mroeschke
Added to the test. Pinging you on change.
Remove exception from test_dti_construction_onexistent_endpoint
|
@jreback They had this for Python 2.6 datetime support and dropped it in this PR. I set up a python 3.6 environment and tested this hypothesis. The tests fail with dateutil 2.6.1 and pass with dateutil 2.7.0. Basically, I want to know how I should proceed. |
|
so just change the test have different cases for >=2,7 and <2,7 |
| Tests for DatetimeIndex timezone-related methods | ||
| """ | ||
| from datetime import date, datetime, time, timedelta, tzinfo | ||
| from distutils.version import LooseVersion |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to parse version properly, so have to import this.
| ts = Timestamp(epoch, tz="dateutil/US/Pacific") | ||
| result = ts.tz.dst(ts) | ||
| expected = timedelta(seconds=0) | ||
| assert Timestamp(ts).value == epoch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mroeschke
This tests that the value no longer shifts when we call the constructor again. In the issue, this failed for epoch = 1552211999999999999
|
Because |
|
@jreback |
|
thanks @AlexKirko very nice |
| # Make sure that calling Timestamp constructor | ||
| # on time just before DST switch doesn't lead to | ||
| # nonexistent time or value change | ||
| # Works only with dateutil >= 2.7.0 as dateutil overrid |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/overrid/overrode
| # nonexistent time or value change | ||
| # Works only with dateutil >= 2.7.0 as dateutil overrid | ||
| # pandas.Timedelta.total_seconds with | ||
| # datetime.timedelta.total_seconds before |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not make sense to me - datetime.timedelta.total_seconds should succeed, because it's equivalent to:
def total_seconds(td):
useconds = td.days * 86400
useconds += td.seconds
useconds *= 1000000
useconds += td.microseconds
return useconds / 1e6I think there's actually a deeper issue here, which is that td.microseconds and td.seconds are rounded rather than truncated. Consider this:
def to_ns(td):
ns = td.days * 86400
ns += td.seconds
ns *= 1000000
ns += td.microseconds
ns *= 1000
ns += td.nanoseconds
return ns
td = timedelta(1552211999999999872, unit="ns")
print(td.value) # 1552211999999999872
print(to_ns(td)) # 1552212000000000872That seems to be the actual root cause of this issue and should probably be fixed.
| # Works only with dateutil >= 2.7.0 as dateutil overrid | ||
| # pandas.Timedelta.total_seconds with | ||
| # datetime.timedelta.total_seconds before | ||
| ts = Timestamp(epoch, tz="dateutil/US/Pacific") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forgot to mention earlier: you should probably use dateutil/America/Los_Angeles, as that is the canonical name for this zone. The US/... zones are symlinks for backwards compatibility.
black pandasgit diff upstream/master -u -- "*.py" | flake8 --diffThis implements rounding down to microseconds into Timedelta.total_seconds(). Lack of this rounding led to
dateutil.tz.tzinfo.utcoffsetanddstbreaking less than 128 nanoseconds before winter/summer DST switch. This happened due to an unintended cast to float inTimedelta.total_seconds()compounded withTimedelta.valuebeing np.int64 type which doesn't support long arithmetic. The loss of precision led to rounding up, which meant thattotal_secondssince epoch time implied DST time while theTimedelta.valuehasn't yet reached DST.Details and code examples below.
This was quite a journey, but I found out what's going on. Let's say we have a
dateutil.tz.tzinfoobject nameddu_tzand want to find out the DST-aware UTC offset.du_tz.utcoffset(dt)which callsdu_tz._find_ttinfo(dt)which callsdu_tz._resolve_ambiguous_time(dt)to find the index of the last transition time that it uses to return the correct offset.du_tz._resolve_ambiguous_time(dt)callsdu_tz._find_last_transition(dt)which calls the_datetime_to_timestamp(dt)dateutil function.The problem is dateutil's reliance on
Timedelta.total_secondswhich casts to float:Demonstration:
The same thing happens with a
pytztimezone, only pytz relies on something else to check for DST transitions:So
timedeltavalue is okay, buttimedelta.total_secondshits the precision limit of floats, and this leads to rounding and an incorrect DST offset.