Skip to content

REF: simplify _constructor/_from_mgr/_constructor_from_mgr/... #56681

Open
@jbrockmendel

Description

@jbrockmendel

ATM we have

  • _constructor
  • _constructor_sliced
  • _constructor_expanddim
  • _from_mgr
  • _sliced_from_mgr
  • _expanddim_from_mgr
  • _constructor_from_mgr
  • _constructor_sliced_from_mgr
  • _constructor_expanddim_from_mgr

I suspect that we can make do with fewer of these, or at least simplify them (e.g. make some @final, @classmethod, #51772, make downstream libraries never touch Managers, very slightly improve perf in the non-subclass cases). Some of these ideas are mutually exclusive:

  1. ATM _from_mgr is a classmethod, all the other _from_mgr methods are regular methods. The constructor_from_mgr methods rely on self._constructor*, so could not be made classmethods unless those constructors also became classmethods. Does anyone out there rely on _constructor not being a classmethod?

  2. _expanddim_from_mgr and _sliced_from_mgr are each only used once, in _constructor_expanddim_from_mgr and _constructor_sliced_from_mgr, respectively. Can we just inline these and remove the methods (maybe keep the less-verbose names)? If not, can we at least @final and @classmethod them?

  3. Can _from_mgr be made @final? If not can _constructor_from_mgr be inlined into it?

  4. The 3 listed expanddim methods are there just for Series.to_frame. We could get rid of all three and tell subclasses to just override to_frame. (Doing this would also make it easier to tell subclass authors to handle metadata propagation in to_frame xref _metadata items of subclassed pd.Series are not propagated into corresponding SubclassedDataFrame #32860)

  5. We can slightly improve perf for non-subclasses by changing if self._constructor is Series to if type(self) is Series at the cost of slightly hurting perf for subclasses.

  6. Most extreme: we can make it so that a) (most) subclasses don't need to override any from_mgr methods, b) subclasses __init__ don't need to know how to handle Managers, c) we don't need the constructorfrom_mgr methods and d) (most) subclass authors never touch Managers by writing e.g.

def _from_mgr(self, mgr, axes):
    df = DataFrame._existing_from_mgr_method(mgr, axes)
    if type(self) is DataFrame:
        return df
    return self._constructor(df)

i.e. assume that the subclass __init__ knows what to do when given a pandas object.

@jorisvandenbossche writes that this would inconvenience geopandas since it does inference when given a DataFrame but not when given a Manager. In cases like that geopandas would need to override _from_mgr (why "most" is needed in a) and d) above), but ATM they need to override _constructor_from_mgr, so that wouldn't be a net increase in Stuff They Need To Do.

This idea is roughly #53871, which was reverted by #54922.

Metadata

Metadata

Labels

Needs DiscussionRequires discussion from core team before further actionRefactorInternal refactoring of codeSubclassingSubclassing pandas objects

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions