Description
Feature Type
-
Adding new functionality to pandas
-
Changing existing functionality in pandas
-
Removing existing functionality in pandas
Problem Description
A Categorical
can currently be converted to a numpy array by using the to_numpy
method. This is fine if the underlying array is a numpy array, but if it is an ExtensionArray
it is is not possible to get an ExtensionArray
from a Categorical
, except recreating the ExtensionArray
manually.
For example for StringArray
:
>>> import pandas as pd
>>> arr = pd.array(["a", pd.NA])
>>> cat = pd.Categorical(arr)
>>> cat.to_numpy()
array(['a', <NA>], dtype=object) # does not maintain dtype
>>> cat.categories.array[cat.codes] # manual method for getting the desired array type and dtype
<StringArray>
['a', 'a']
Length: 2, dtype: string
Feature Description
I propose adding a to_array
method to Categorical
, CategoricalAccessor
& CategoricalIndex
, which will return an array of the appropriate type (numpy array or ExtensionArray) of the same length as the Categorical. It probably should be possible to convert to and from Categorical
s and ExtensionArrays
/numpy.ndarray
s losslessly and this should be tested for.
Alternative Solutions
The alternative would be to create the underlying array manually, as in the example above.
Additional Context
No response