You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When a column contains all NaN values, rdt fails to remove it completely from the output.
This happens because when the NullTransformer is applied it replaces all the null values with the average of the column, which ends up being NaN again when the column is all NaN values.
Solution would be to replace the average with a fixed value, like 0, if the average ends up being NaN.
Example
In [1]: import pandas as pd
In [2]: import numpy as np
In [3]: df = pd.DataFrame({
...: 'a': [np.nan],
...: })
In [4]: from rdt import HyperTransformer
In [5]: ht = HyperTransformer()
In [6]: ht.fit_transform(df)
Out[6]:
a#0 a#1
0 NaN 1.0
The text was updated successfully, but these errors were encountered:
Description
When a column contains all
NaN
values,rdt
fails to remove it completely from the output.This happens because when the
NullTransformer
is applied it replaces all the null values with the average of the column, which ends up beingNaN
again when the column is allNaN
values.Solution would be to replace the average with a fixed value, like
0
, if the average ends up beingNaN
.Example
The text was updated successfully, but these errors were encountered: