Skip to content

BUG: If you add _metadata to a custom subclass of Series, the sequence name is lost when indexing #61491

Open
@vitalizzare

Description

@vitalizzare

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd


class MySeq(pd.Series):
    
    _metadata = ['property']
    
    @property
    def _constructor(self):
        return MySeq


seq = MySeq([*'abc'], name='data')

assert seq.name == 'data'
assert seq[1:2].name == 'data'
assert seq[[1, 2]].name is None
assert seq.drop_duplicates().name is None

Issue Description

pandas 2.2.3

Let’s consider two variants of defining a custom subtype of pandas.Series. In the first one, no custom properties are added, while in the second one, custom metadata is included:

import pandas as pd


class MySeries(pd.Series):

    @property
    def _constructor(self):
        return MySeries


seq = MySeries([*'abc'], name='data')

print(f'''Case without _metadata:
  {isinstance(seq[0:1], MySeries) = }
  {isinstance(seq[[0, 1]], MySeries) = }
  {seq[0:1].name = }  
  {seq[[0, 1]].name = }
''')

class MySeries(pd.Series):

    _metadata = ['property']

    @property
    def _constructor(self):
        return MySeries


seq = MySeries([*'abc'], name='data')
seq.property = 'MyProperty'

print(f'''Case with defined _metadata:
  {isinstance(seq[0:1], MySeries) = }
  {isinstance(seq[[0, 1]], MySeries) = }
  {seq[0:1].name = }  
  {seq[[0, 1]].name = }
  {getattr(seq[0:1], 'property', 'NA') = }  
  {getattr(seq[[0, 1]], 'property', 'NA') = }  
''')

The output of the code above will be:

Case without _metadata:
  isinstance(seq[0:1], MySeries) = True
  isinstance(seq[[0, 1]], MySeries) = True
  seq[0:1].name = 'data'  
  seq[[0, 1]].name = 'data'

Case with defined _metadata:
  isinstance(seq[0:1], MySeries) = True
  isinstance(seq[[0, 1]], MySeries) = True
  seq[0:1].name = 'data'  
  seq[[0, 1]].name = None         <<< Problematic result of indexing
  getattr(seq[0:1], 'property', 'NA') = 'MyProperty'  
  getattr(seq[[0, 1]], 'property', 'NA') = 'MyProperty'  

So, if _metadata is defined, the sequence name is preserved when slicing, but lost when indexing with a list, whereas without _metadata the name is preserved in both cases.

As a workaround we can add 'name' to _metadata:

class MySeries(pd.Series):

    _metadata = ['property', 'name']

    @property
    def _constructor(self):
        return MySeries


seq = MySeries([*'abc'], name='data')

assert seq[0:1].name == 'data'
assert seq[[0, 1]].name == 'data' 

However, I'm not sure if there's no deferred issues caused by treating name as a metadata attribute.

The problem arose when applying PyJanitor methods to user-defined DataFrames with _metadata. Specifically, drop_duplicates was applied to a separate column, followed by an attempt to access its name in order to combine the result into a new DataFrame.

Expected Behavior

import pandas as pd


class MySeq(pd.Series):
    
    _metadata = ['property']
    
    @property
    def _constructor(self):
        return MySeq


seq = MySeq([*'abc'], name='data')

assert seq[[1, 2]].name  == 'data'
assert seq.drop_duplicates().name  == 'data'

Installed Versions

INSTALLED VERSIONS

commit : cfe54bd
python : 3.13.2
python-bits : 64
OS : Linux
OS-release : 4.15.0-213-generic
Version : #224-Ubuntu SMP Mon Jun 19 13:30:12 UTC 2023
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 3.0.0.dev0+2124.gcfe54bd5da
numpy : 2.3.0.dev0+git20250304.6611d55
dateutil : 2.9.0.post0
pip : 24.3.1
Cython : None
sphinx : None
IPython : None
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : None
bottleneck : None
fastparquet : None
fsspec : None
html5lib : None
hypothesis : None
gcsfs : None
jinja2 : None
lxml.etree : None
matplotlib : None
numba : None
numexpr : None
odfpy : None
openpyxl : None
psycopg2 : None
pymysql : None
pyarrow : None
pyiceberg : None
pyreadstat : None
pytest : None
python-calamine : None
pytz : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlsxwriter : None
zstandard : None
tzdata : 2025.2
qtpy : None
pyqt5 : None

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugNeeds TriageIssue that has not been reviewed by a pandas team member

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions