Description
In pyarrow we differentiate between missing (null
) values, which we define with a bitmask, and NaN
float values.
From the dataframe interchange protocol specification we have understood that one can use NaN
to indicate missing values but that does not need to be the case (one can use NaN
as a valid value)
dataframe-api/protocol/dataframe_protocol.py
Lines 195 to 213 in 4f7c1e0
There will be disceptancy between pyarrow and pandas, for example, where NaN
will be turned into missing value. But we do not think it would be correct for pyarrow to change the null_count
property as the information about the difference would be lost for the libraries that would benefit from it. Also the bitmask information and the information in the null_count
would need to be made equal.
Is there a way a library could keep the behaviour of not treating NaNs as nulls?
(Connected issue in the arrow repo apache/arrow#34774)