ENH: add mask-aware implementation of factorize algos #30037
Labels
Algos
Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff
Enhancement
ExtensionArray
Extending pandas with custom dtypes or arrays.
NA - MaskedArrays
Related to pd.NA and nullable extension arrays
Now we start to have mask-based dtypes/arrays (integer, boolean), we should also look into making our algos work with such masked arrays. An example for which we could explore this is
factorize
/unique
.Currently, BooleanArray and IntegerArray need to convert their masked array into a single numpy array using a certain "NA sentinel" that is specified so the algo can recognize this sentinel. This happens through the
ExtensionArray._values_for_factorize
, which returns a (numpy array, NA sentinel) tuple.In practice this means that the boolean array is converted to integer (with NA as -1), and IntegerArray is converted to float array with NA as NaN, so the algos can handle this.
We should look into:
ExtensionArray._values_for_factorize
that then returns (array, mask) instead of (array, NA).The text was updated successfully, but these errors were encountered: