Skip to content

ENH: add Categorical.to_array CategoricalAccessor.to_array and CategoricalIndex.to_array #50041

Open
@topper-123

Description

@topper-123

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

A Categorical can currently be converted to a numpy array by using the to_numpy method. This is fine if the underlying array is a numpy array, but if it is an ExtensionArray it is is not possible to get an ExtensionArray from a Categorical, except recreating the ExtensionArray manually.

For example for StringArray:

>>> import pandas as pd
>>> arr = pd.array(["a", pd.NA])
>>> cat = pd.Categorical(arr)
>>> cat.to_numpy()
array(['a', <NA>], dtype=object)  # does not maintain dtype
>>> cat.categories.array[cat.codes]  # manual method for getting the desired array type and dtype
<StringArray>
['a', 'a']
Length: 2, dtype: string

Feature Description

I propose adding a to_array method to Categorical, CategoricalAccessor & CategoricalIndex, which will return an array of the appropriate type (numpy array or ExtensionArray) of the same length as the Categorical. It probably should be possible to convert to and from Categoricals and ExtensionArrays/numpy.ndarrays losslessly and this should be tested for.

Alternative Solutions

The alternative would be to create the underlying array manually, as in the example above.

Additional Context

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions