-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix minor bugs in Perspective datetime + NaN handling #1028
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! Looks good!
def test_table_datetime_infer_from_string_with_time(self): | ||
data = {"a": ["11:00 ABCD"]} | ||
tbl = Table(data) | ||
assert tbl.schema() == {"a": str} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we definitely need to unify on one implementation of this and port it to C++
# Augment the default error message when invalid values are | ||
# detected, i.e. `NaN` | ||
if error_string == "Out of range float values are not JSON compliant": | ||
error_string = "Cannot serialize `NaN`, `Infinity` or `-Infinity` to JSON." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very user friendly! Tragic that we must resort to this tho ...
This PR fixes a few small bugs in
perspective-python
:NaN
,Infinity
, and-Infinity
are no longer serialized into JSON byPerspectiveManager
, and any remote calls intoPerspectiveManager
that returns a result withNaN
inside will be rejected on the client-side. The server will not raise an Exception, but the exception will be visible inside the client browser console.A Short Explanation
Perspective's data loading and model treats
None
ornull
as invalid values, and the Javascript valueundefined
as a value for "nothing exists in this cell". To the extent possible, Perspective will strip and treatNaN
s asNone
(in Python) ornull
(in Javascript).A case where
NaN
s cannot be coerced toNone
is in Arrows with columns that containNaN
values but do not have thenull_count
or the Arrow array's validity bitmap correctly set. To take advantage of the speed benefits provided by Arrow, Perspective treats Arrow-formatted data as strongly typed within each column, and where invalid values are indicated through use of the validity bitmap. This allows Perspective tomemcpy
arrays from Arrow into Perspective, i.e. a basically free operation that massively speeds up Arrow loading.At this time, there are no plans to support a distinction between
NaN
and None in Perspective, especially as the visualization value ofNaN
andNone
are equivalent - both denote invalid values. Thus, usingperspective-python
and attempting to serialize data containingNaN
values will fail, and this change provides an explicit error message.Additionally, PyArrow's default behavior when converting from Pandas is to treat
NaN
values asNone
.Other Changes
Datetimes before the year 1900 are treated automatically as UTC using
calendar.timegm
, instead oftime.mktime
which raised an error whenever dealing with years before 1900.datetime.min
values passed in to Perspective used to return "2001/1/1" due to a edge case in how timestamps were calculated before being passed into C++. Now, datetime.min values (whenever year, month, and day are1
and hour, minute, seconds are0
) return a timestamp of 0, i.e. the Unix epoch.