Closed
Description
Discussed briefly on the call today, but we should go through things formally.
What should the return type of Series[extension_array].values
and Index[extension_array].values
be? I believe the two options are
- Return the ExtensionArray backing it (e.g. like what Categorical does)
- Return an ndarray with some information loss / performance cost
- e.g. like Series[datetimeTZ].values -> datetime64ns at UTC
- e.g. Series[period].values -> ndarray[Period objects]
Current State
Not sure how much weight we should put on the current behavior, but for reference:
type | Series.values | Index.values |
---|---|---|
datetime | datetime64ns | datetime64ns |
datetime-tz | datetine64ns(UTC&naive) | datetime64ns(UTC&naive) |
categorical | Categorical | Categorical |
period | NA | ndarray[Period objects] |
interval | NA | ndarray[Interval objects] |
In [5]: pd.Series(pd.date_range('2017', periods=1)).values
Out[5]: array(['2017-01-01T00:00:00.000000000'], dtype='datetime64[ns]')
In [6]: pd.Series(pd.date_range('2017', periods=1, tz='US/Eastern')).values
Out[6]: array(['2017-01-01T05:00:00.000000000'], dtype='datetime64[ns]')
In [7]: pd.Series(pd.Categorical([1])).values
Out[7]:
[1]
Categories (1, int64): [1]
In [8]: pd.Series(pd.SparseArray([1])).values
Out[8]:
[1]
Fill: 0
IntIndex
Indices: array([0], dtype=int32)
In [9]: pd.date_range('2017', periods=1).values
Out[9]: array(['2017-01-01T00:00:00.000000000'], dtype='datetime64[ns]')
In [10]: pd.date_range('2017', periods=1, tz='US/Central').values
Out[10]: array(['2017-01-01T06:00:00.000000000'], dtype='datetime64[ns]')
In [11]: pd.period_range('2017', periods=1, freq='D').values
Out[11]: array([Period('2017-01-01', 'D')], dtype=object)
In [12]: pd.interval_range(start=0, periods=1).values
Out[12]: array([Interval(0, 1, closed='right')], dtype=object)
In [13]: pd.CategoricalIndex([1]).values
Out[13]:
[1]
Categories (1, int64): [1]
If we decide to have the return values be ExtensionArrays, we'll need to discuss
to what extent they're part of the public API.
Regardless of the choice for .values
, we'll probably want to support the other
use case (maybe just by documenting "call np.asarray
on it). Internally, we
have ._values
("best" array, ndarray or EA) and ._ndarray_values
(always an
ndarray).
cc @jreback @jorisvandenbossche @jschendel @jbrockmendel @shoyer @chris-b1