API: Index and Array constructors design

To split off the discussion on the constructors from https://github.com/pandas-dev/pandas/issues/23185, to have a more focussed discussion about that here. Also going further on the discussion we were having in https://github.com/pandas-dev/pandas/pull/23140/files#r225218594

So topic of this issue: how should the different constructors look like for the internal EAs and the Index classes based on those EAs (specifically for the datetimelike ones).

### Index constructors

I think the for the Index constructors, there is not *that* much discussion. 
We have:

- default `Index(..)` (`__new__` or `__init__`): this is quite overloaded for some of the index classes, but that's the way it is now since they are exposed to the user.
- `_simple_new`: I think we agree that for those (from Tom's comment here https://github.com/pandas-dev/pandas/pull/23093#pullrequestreview-164015080), it should basically simply get the EA array and potentially a name:

    ```
    @classmethod
    def _simple_new(cls, values, name=None):
        # type: (Union[ndarray, ExtensionArray], Optional[Any]) -> Index
        result = object.__new__(cls)
        result._data = values
        result.name = name
        result._reset_identity()
        return result
    ```

- `_shallow_copy` and `_shallow_copy_with_infer` might need another look to propose something.



### Array constructors

The default Index constructors mix a lot of different things (which is what partly lead to the suite of other constructors), and I personally don't think this is something we necessarily need to repeat for the Array constructors. 

In the discussion related to 

Each Array type might have it specific constructors (like we have `IntervalArray.from_breaks` and others), but I think that in the discussion we were having in https://github.com/pandas-dev/pandas/pull/23140/files#r225218594, there are 3 clearly defined use case that are generic for the different dtypes. Constructing from:

1) physical values (ordinals + freq for Period, datetime64 + optional tz for Datetime, int ndarray + mask for IntegerArray)
2) extension Array (i.e. accept itself)
3) array of scalars (eg an object ndarray of Period or Timestamp objects)

For this last item, we already `_from_sequence` for exactly this as part of the EA interface. 

So one option is simply accept all of those three things in the main Array `__init__`, another option is to have separate constructors for them. I think this is what the discussion is mainly about?

I see the following advantages of keeping them separate (or at least keep the third item separate):

- Code clarity throughout the Array implementation: To quote Tom from (https://github.com/pandas-dev/pandas/pull/23140#discussion_r225371045): 
  > From the WIP PeriodArray PR, I found that having to think carefully about what type of data I had forced some clarity in the code. I liked having to explicitly reach for that _from_periods constructor.
- Keep concerns separated inside the constructor -> code clarity in the constructors itself
- We already decided that we will not rely on the default constructor in the EA interface but rather have a specific `_from_sequence`, `_from_factorized`, so we cannot use it anyway in places that need to deal with EAs in general

Also note that this is basically what we have for the new IntegerArray. It's `__init__` only accepts a ndarray of integers + mask, and there is a separate function `integer_array` that provides a more general purpose constructor (from list, from floats, detecting NaNs as missing values, etc ..), which is then used in `_from_sequence`.

cc @TomAugspurger @jreback @jbrockmendel 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

API: Index and Array constructors design #23212

Index constructors

Array constructors

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

API: Index and Array constructors design #23212

Description

Index constructors

Array constructors

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions