Description
ATM we have
- _constructor
- _constructor_sliced
- _constructor_expanddim
- _from_mgr
- _sliced_from_mgr
- _expanddim_from_mgr
- _constructor_from_mgr
- _constructor_sliced_from_mgr
- _constructor_expanddim_from_mgr
I suspect that we can make do with fewer of these, or at least simplify them (e.g. make some @final
, @classmethod
, #51772, make downstream libraries never touch Managers, very slightly improve perf in the non-subclass cases). Some of these ideas are mutually exclusive:
-
ATM _from_mgr is a classmethod, all the other _from_mgr methods are regular methods. The constructor_from_mgr methods rely on
self._constructor*
, so could not be made classmethods unless those constructors also became classmethods. Does anyone out there rely on _constructor not being a classmethod? -
_expanddim_from_mgr
and_sliced_from_mgr
are each only used once, in_constructor_expanddim_from_mgr
and_constructor_sliced_from_mgr
, respectively. Can we just inline these and remove the methods (maybe keep the less-verbose names)? If not, can we at least@final
and@classmethod
them? -
Can
_from_mgr
be made@final
? If not can_constructor_from_mgr
be inlined into it? -
The 3 listed expanddim methods are there just for
Series.to_frame
. We could get rid of all three and tell subclasses to just overrideto_frame
. (Doing this would also make it easier to tell subclass authors to handle metadata propagation in to_frame xref _metadata items of subclassed pd.Series are not propagated into corresponding SubclassedDataFrame #32860) -
We can slightly improve perf for non-subclasses by changing
if self._constructor is Series
toif type(self) is Series
at the cost of slightly hurting perf for subclasses. -
Most extreme: we can make it so that a) (most) subclasses don't need to override any from_mgr methods, b) subclasses
__init__
don't need to know how to handle Managers, c) we don't need the constructorfrom_mgr methods and d) (most) subclass authors never touch Managers by writing e.g.
def _from_mgr(self, mgr, axes):
df = DataFrame._existing_from_mgr_method(mgr, axes)
if type(self) is DataFrame:
return df
return self._constructor(df)
i.e. assume that the subclass __init__
knows what to do when given a pandas object.
@jorisvandenbossche writes that this would inconvenience geopandas since it does inference when given a DataFrame but not when given a Manager. In cases like that geopandas would need to override _from_mgr (why "most" is needed in a) and d) above), but ATM they need to override _constructor_from_mgr, so that wouldn't be a net increase in Stuff They Need To Do.