Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: underflow on Timestamp creation #14433

Merged
merged 3 commits into from
Oct 20, 2016
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Next Next commit
BUG: underflow on Timestamp creation
  • Loading branch information
chris-b1 committed Oct 20, 2016
commit 866475710038c9034a9169640e1ab6dcbfea7d9c
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.19.1.txt
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,7 @@ Bug Fixes
- ``pd.merge()`` will raise ``ValueError`` with non-boolean parameters in passed boolean type arguments (:issue:`14434`)


- Bug in ``Timestamp`` where dates very near the minimum (1677-09) could underflow on creation (:issue:`14415`)

- Bug in ``pd.concat`` where names of the ``keys`` were not propagated to the resulting ``MultiIndex`` (:issue:`14252`)
- Bug in ``pd.concat`` where ``axis`` cannot take string parameters ``'rows'`` or ``'columns'`` (:issue:`14369`)
Expand Down
21 changes: 14 additions & 7 deletions pandas/src/datetime/np_datetime.c
Original file line number Diff line number Diff line change
Expand Up @@ -846,7 +846,8 @@ convert_datetime_to_datetimestruct(pandas_datetime_metadata *meta,
dt = dt % perday;
}
else {
set_datetimestruct_days((dt - (perday-1)) / perday, out);
set_datetimestruct_days(dt / perday - (dt % perday == 0 ? 0 : 1),
out);
dt = (perday-1) + (dt + 1) % perday;
}
out->hour = dt;
Expand All @@ -860,7 +861,8 @@ convert_datetime_to_datetimestruct(pandas_datetime_metadata *meta,
dt = dt % perday;
}
else {
set_datetimestruct_days((dt - (perday-1)) / perday, out);
set_datetimestruct_days(dt / perday - (dt % perday == 0 ? 0 : 1),
out);
dt = (perday-1) + (dt + 1) % perday;
}
out->hour = dt / 60;
Expand All @@ -875,7 +877,8 @@ convert_datetime_to_datetimestruct(pandas_datetime_metadata *meta,
dt = dt % perday;
}
else {
set_datetimestruct_days((dt - (perday-1)) / perday, out);
set_datetimestruct_days(dt / perday - (dt % perday == 0 ? 0 : 1),
out);
dt = (perday-1) + (dt + 1) % perday;
}
out->hour = dt / (60*60);
Expand All @@ -891,7 +894,8 @@ convert_datetime_to_datetimestruct(pandas_datetime_metadata *meta,
dt = dt % perday;
}
else {
set_datetimestruct_days((dt - (perday-1)) / perday, out);
set_datetimestruct_days(dt / perday - (dt % perday == 0 ? 0 : 1),
out);
dt = (perday-1) + (dt + 1) % perday;
}
out->hour = dt / (60*60*1000LL);
Expand All @@ -908,7 +912,8 @@ convert_datetime_to_datetimestruct(pandas_datetime_metadata *meta,
dt = dt % perday;
}
else {
set_datetimestruct_days((dt - (perday-1)) / perday, out);
set_datetimestruct_days(dt / perday - (dt % perday == 0 ? 0 : 1),
out);
dt = (perday-1) + (dt + 1) % perday;
}
out->hour = dt / (60*60*1000000LL);
Expand All @@ -925,7 +930,8 @@ convert_datetime_to_datetimestruct(pandas_datetime_metadata *meta,
dt = dt % perday;
}
else {
set_datetimestruct_days((dt - (perday-1)) / perday, out);
set_datetimestruct_days(dt / perday - (dt % perday == 0 ? 0 : 1),
out);
dt = (perday-1) + (dt + 1) % perday;
}
out->hour = dt / (60*60*1000000000LL);
Expand All @@ -943,7 +949,8 @@ convert_datetime_to_datetimestruct(pandas_datetime_metadata *meta,
dt = dt % perday;
}
else {
set_datetimestruct_days((dt - (perday-1)) / perday, out);
set_datetimestruct_days(dt / perday - (dt % perday == 0 ? 0 : 1),
out);
dt = (perday-1) + (dt + 1) % perday;
}
out->hour = dt / (60*60*1000000000000LL);
Expand Down
9 changes: 9 additions & 0 deletions pandas/tseries/tests/test_timeseries.py
Original file line number Diff line number Diff line change
Expand Up @@ -4463,6 +4463,15 @@ def test_basics_nanos(self):
self.assertEqual(stamp.microsecond, 0)
self.assertEqual(stamp.nanosecond, 500)

# GH 14415
val = np.iinfo(np.int64).min + 80000000000000
stamp = Timestamp(val)
self.assertEqual(stamp.year, 1677)
self.assertEqual(stamp.month, 9)
self.assertEqual(stamp.day, 21)
self.assertEqual(stamp.microsecond, 145224)
self.assertEqual(stamp.nanosecond, 192)

def test_unit(self):

def check(val, unit=None, h=1, s=1, us=0):
Expand Down
12 changes: 8 additions & 4 deletions pandas/tslib.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,11 @@ from cpython cimport (
PyUnicode_AsUTF8String,
)


cdef extern from "headers/stdint.h":
enum: INT64_MAX
enum: INT64_MIN

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't these be imported from the numpy headers? for consistency

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is how we do it in lib.pyx, but agreed it probably makes more sense to pull it from the numpy headers.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok no big deal
ideally should be consistent across the code
maybe just import into util.pxd and use from there?

# Cython < 0.17 doesn't have this in cpython
cdef extern from "Python.h":
cdef PyTypeObject *Py_TYPE(object)
Expand Down Expand Up @@ -904,10 +909,9 @@ cpdef object get_value_box(ndarray arr, object loc):


# Add the min and max fields at the class level
# These are defined as magic numbers due to strange
# wraparound behavior when using the true int64 lower boundary
cdef int64_t _NS_LOWER_BOUND = -9223285636854775000LL
cdef int64_t _NS_UPPER_BOUND = 9223372036854775807LL
# INT64_MIN is reserved for NaT
cdef int64_t _NS_LOWER_BOUND = INT64_MIN + 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why change this?
iirc some platforms don't play nice with this definition

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old lower bound was higher than it needed to be, because of this issue; although we didn't actually enforce it everywhere. Upper bound is the same.

I can go back to hardcoded numbers, just thought the defined constants were clearer - I assumed INT64_MAX is platform independent (could be wrong)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's fine
i don't know why these were hard coded in the first place

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On further look, I'll change the lower bound back - in our compat with python datetime we assume in several places that the min value has a 0 nanosecond unit - I think it could be worked around, but fairly invasive for another 800 ns of span.

cdef int64_t _NS_UPPER_BOUND = INT64_MAX

cdef pandas_datetimestruct _NS_MIN_DTS, _NS_MAX_DTS
pandas_datetime_to_datetimestruct(_NS_LOWER_BOUND, PANDAS_FR_ns, &_NS_MIN_DTS)
Expand Down