You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Python has the known issue that its standard json implementation produces invalid JSON: NaN and Infinity values are not supported by the json specification, but Python serializes those values anyway.
Because of that, when using ProfileReport.to_json(), the JSON is often not valid. For example, the example code below produces this output that other JSON implementations will fail to parse:
I did this on the spark-branch , commit 9017c4a5e26e22152ed3f24c5ec628f70859fa14
Expected Behaviour
This is not really the fault of pandas-profiling, but it would be really nice if to_json could return valid JSON.
I see several ways to solve this:
Never actually return any NaNs in the produced statistics, always set values to None. This would have the problem that some parts of the report contain actual input data and that could still contain NaNs
Use a different JSON library that produces a valid output
(Maybe the easiest solution) Check for non-finite numbers in the existing encode_it function and replace them with None so that the JSON will contain null values
The changed encode_it function could look like this (I would be happy to send a pull request if that is the accepted solution):
defencode_it(o: Any) ->Any:
ifisinstance(o, dict):
return {encode_it(k): encode_it(v) fork, vino.items()}
else:
ifisinstance(o, (bool, int, str)):
returnoifisinstance(o, float):
ifnotmath.isfinite(o):
# Encode non-finite floats as None.# This is necessary because JSON does not support NaN/Infinity values.returnNoneelse:
returnoelifisinstance(o, list):
[...]
Well spotted @LukasBoersma! In addition to your bug report proposals, we could also consider writing non-finite floats as string (which will result in a mixed-type). This way there is no information loss.
At this moment my preference would go out to your proposed solution above (perhaps with a parameter to toggle between handling). A PR is much appreciated, lets take it from there.
Hi , Lukas would like to mention some refinements in this code.
Make global class variables as private and call them using function .
Rather than defining and passing one value through config and checking in profile report make different instances for the same .
Current Behaviour
Python has the known issue that its standard json implementation produces invalid JSON: NaN and Infinity values are not supported by the json specification, but Python serializes those values anyway.
Because of that, when using
ProfileReport.to_json()
, the JSON is often not valid. For example, the example code below produces this output that other JSON implementations will fail to parse:I did this on the
spark-branch
, commit9017c4a5e26e22152ed3f24c5ec628f70859fa14
Expected Behaviour
This is not really the fault of pandas-profiling, but it would be really nice if
to_json
could return valid JSON.I see several ways to solve this:
encode_it
function and replace them with None so that the JSON will contain null valuesThe changed encode_it function could look like this (I would be happy to send a pull request if that is the accepted solution):
Data Description
DataFrame([1, 1, 1], columns=["a"])
Code that reproduces the bug
pandas-profiling version
spark-branch @ 9017c4a
Dependencies
OS
Windows 10
Checklist
The text was updated successfully, but these errors were encountered: