Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix minor bugs in Perspective datetime + NaN handling #1028

Merged
merged 5 commits into from
May 6, 2020
Merged

Conversation

sc1f
Copy link
Contributor

@sc1f sc1f commented Apr 28, 2020

This PR fixes a few small bugs in perspective-python:

  • NaN, Infinity, and -Infinity are no longer serialized into JSON by PerspectiveManager, and any remote calls into PerspectiveManager that returns a result with NaN inside will be rejected on the client-side. The server will not raise an Exception, but the exception will be visible inside the client browser console.

A Short Explanation

Perspective's data loading and model treats None or null as invalid values, and the Javascript value undefined as a value for "nothing exists in this cell". To the extent possible, Perspective will strip and treat NaNs as None (in Python) or null (in Javascript).

A case where NaNs cannot be coerced to None is in Arrows with columns that contain NaN values but do not have the null_count or the Arrow array's validity bitmap correctly set. To take advantage of the speed benefits provided by Arrow, Perspective treats Arrow-formatted data as strongly typed within each column, and where invalid values are indicated through use of the validity bitmap. This allows Perspective to memcpy arrays from Arrow into Perspective, i.e. a basically free operation that massively speeds up Arrow loading.

At this time, there are no plans to support a distinction between NaN and None in Perspective, especially as the visualization value of NaN and None are equivalent - both denote invalid values. Thus, using perspective-python and attempting to serialize data containing NaN values will fail, and this change provides an explicit error message.

Additionally, PyArrow's default behavior when converting from Pandas is to treat NaN values as None.

Other Changes

  • Datetimes before the year 1900 are treated automatically as UTC using calendar.timegm, instead of time.mktime which raised an error whenever dealing with years before 1900.

  • datetime.min values passed in to Perspective used to return "2001/1/1" due to a edge case in how timestamps were calculated before being passed into C++. Now, datetime.min values (whenever year, month, and day are 1 and hour, minute, seconds are 0) return a timestamp of 0, i.e. the Unix epoch.

@sc1f sc1f added internal Internal refactoring and code quality improvement Python labels Apr 28, 2020
@sc1f sc1f self-assigned this Apr 28, 2020
Copy link
Member

@texodus texodus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! Looks good!

def test_table_datetime_infer_from_string_with_time(self):
data = {"a": ["11:00 ABCD"]}
tbl = Table(data)
assert tbl.schema() == {"a": str}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we definitely need to unify on one implementation of this and port it to C++

# Augment the default error message when invalid values are
# detected, i.e. `NaN`
if error_string == "Out of range float values are not JSON compliant":
error_string = "Cannot serialize `NaN`, `Infinity` or `-Infinity` to JSON."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very user friendly! Tragic that we must resort to this tho ...

@texodus texodus merged commit 391777c into master May 6, 2020
@texodus texodus deleted the bug-sprint branch May 6, 2020 04:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
internal Internal refactoring and code quality improvement Python
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants