Skip to content

.values on ExtensionArray-backed containers #19954

Closed
@TomAugspurger

Description

@TomAugspurger

Discussed briefly on the call today, but we should go through things formally.

What should the return type of Series[extension_array].values and Index[extension_array].values be? I believe the two options are

  1. Return the ExtensionArray backing it (e.g. like what Categorical does)
  2. Return an ndarray with some information loss / performance cost
    • e.g. like Series[datetimeTZ].values -> datetime64ns at UTC
    • e.g. Series[period].values -> ndarray[Period objects]

Current State

Not sure how much weight we should put on the current behavior, but for reference:

type Series.values Index.values
datetime datetime64ns datetime64ns
datetime-tz datetine64ns(UTC&naive) datetime64ns(UTC&naive)
categorical Categorical Categorical
period NA ndarray[Period objects]
interval NA ndarray[Interval objects]
In [5]: pd.Series(pd.date_range('2017', periods=1)).values
Out[5]: array(['2017-01-01T00:00:00.000000000'], dtype='datetime64[ns]')

In [6]: pd.Series(pd.date_range('2017', periods=1, tz='US/Eastern')).values
Out[6]: array(['2017-01-01T05:00:00.000000000'], dtype='datetime64[ns]')

In [7]: pd.Series(pd.Categorical([1])).values
Out[7]:
[1]
Categories (1, int64): [1]

In [8]: pd.Series(pd.SparseArray([1])).values
Out[8]:
[1]
Fill: 0
IntIndex
Indices: array([0], dtype=int32)

In [9]: pd.date_range('2017', periods=1).values
Out[9]: array(['2017-01-01T00:00:00.000000000'], dtype='datetime64[ns]')

In [10]: pd.date_range('2017', periods=1, tz='US/Central').values
Out[10]: array(['2017-01-01T06:00:00.000000000'], dtype='datetime64[ns]')

In [11]: pd.period_range('2017', periods=1, freq='D').values
Out[11]: array([Period('2017-01-01', 'D')], dtype=object)

In [12]: pd.interval_range(start=0, periods=1).values
Out[12]: array([Interval(0, 1, closed='right')], dtype=object)

In [13]: pd.CategoricalIndex([1]).values
Out[13]:
[1]
Categories (1, int64): [1]

If we decide to have the return values be ExtensionArrays, we'll need to discuss
to what extent they're part of the public API.

Regardless of the choice for .values, we'll probably want to support the other
use case (maybe just by documenting "call np.asarray on it). Internally, we
have ._values ("best" array, ndarray or EA) and ._ndarray_values (always an
ndarray).

cc @jreback @jorisvandenbossche @jschendel @jbrockmendel @shoyer @chris-b1

Metadata

Metadata

Assignees

No one assigned

    Labels

    API DesignCompatpandas objects compatability with Numpy or Python functionsExtensionArrayExtending pandas with custom dtypes or arrays.

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions