ENH: add mask-aware implementation of factorize algos

Now we start to have mask-based dtypes/arrays (integer, boolean), we should also look into making our algos work with such masked arrays. An example for which we could explore this is `factorize` / `unique`. 

Currently, BooleanArray and IntegerArray need to convert their masked array into a single numpy array using a certain "NA sentinel" that is specified so the algo can recognize this sentinel. This happens through the `ExtensionArray._values_for_factorize`, which returns a (numpy array, NA sentinel) tuple. 
In practice this means that the boolean array is converted to integer (with NA as -1), and IntegerArray is converted to float array with NA as NaN, so the algos can handle this.

We should look into:

- Can we adapt or make a specific version of the unique/factorize hashtable class that takes a mask instead of a NA sentinel
- We could then have a variant of `ExtensionArray._values_for_factorize` that then returns (array, mask) instead of (array, NA). 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ENH: add mask-aware implementation of factorize algos #30037

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

ENH: add mask-aware implementation of factorize algos #30037

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions