Skip to content

BUG: Infinite recursion on inheritance and overloading __getattr__ in v2.1.0 #55120

Closed
@JoschD

Description

@JoschD

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

from contextlib import suppress
import pandas as pd

class TfsDataFrame(pd.DataFrame):
    """
    Extend DataFrame to add headers, which should be accessed in case 
    the name is not found in the DataFrame itself, but also 
    can be accessed manually by the `headers` attribute.
    """

    _metadata = ["headers"]

    def __init__(self, *args, **kwargs):
        self.headers = {}
        with suppress(IndexError, AttributeError):
            self.headers = args[0].headers
        self.headers = kwargs.pop("headers", self.headers)
        super().__init__(*args, **kwargs)

    def __getattr__(self, name: str) -> object:
        try:
            return super().__getattr__(name)
        except AttributeError:
            pass
        
        try:
            return self.headers[name]
        except KeyError:
            raise AttributeError(f"{name} is neither in the DataFrame nor in headers.")

    @property
    def _constructor(self):
        return TfsDataFrame


if __name__ == "__main__":
    df = TfsDataFrame(columns=["A", "B", "C"])
    print(df[["A", "B"]])

Issue Description

This is a reduced example of a DataFrame extension, that we have been using for years now and it worked fine until version 2.0.3 (including), but broke with 2.1.0.

From 2.1.0 the code above runs into an infinite recursion loop, as df[["A", "B"]] creates a new TfsDataFrame (due to the _constructor).
In its __init__ the args[0] is then already a TfsDataFrame object butargs[0].headers for some reason calls __getattr__() as does self.headers[name] in the __getattr__() function, causing the infinite recursion.

This was not the case in the past, as .headers should be finding the attribute headers "directly", without even going to __getattr__().

Interestingly,

df = TfsDataFrame(columns=["A", "B", "C"])
print(df.headers)

df = TfsDataFrame(columns=["A", "B", "C"], headers={"a":1})
print(df.headers["a"])

are indeed working as intended.

This is the most direct recursion loop I could find, but actually with 2.1.0 we have experienced extended recursion loops, which can create infinite runtime.

Expected Behavior

Should not end up in an infinite recursion.
As is the case e.g. in pandas version 2.0.3.

Thank you very much for your help and your effort maintaining pandas.
Its amazing set of features have made our work with measurement data much more comfortable and streamlined,
not to mention the highly increased readability of our code compared to anno BP (before pandas).

Installed Versions

INSTALLED VERSIONS

commit : ba1cccd
python : 3.9.7.final.0
python-bits : 64
OS : Linux
OS-release : 5.14.0-162.18.1.el9_1.x86_64
Version : #1 SMP PREEMPT_DYNAMIC Tue Feb 28 03:54:43 EST 2023
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : C.UTF-8
LOCALE : en_US.UTF-8

pandas : 2.1.0
numpy : 1.25.2
pytz : 2023.3
dateutil : 2.8.2
setuptools : 59.5.0
pip : 21.3.1
Cython : 0.29.35
pytest : 7.3.1
hypothesis : 6.75.8
sphinx : 6.2.1
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.1.2
IPython : 8.14.0
pandas_datareader : None
bs4 : None
bottleneck : None
dataframe-api-compat: None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : 3.7.1
numba : None
numexpr : 2.8.4
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : 1.11.2
sqlalchemy : None
tables : 3.8.0
tabulate : None
xarray : None
xlrd : None
zstandard : None
tzdata : 2023.3
qtpy : 2.3.1
pyqt5 : None

Metadata

Metadata

Assignees

No one assigned

    Labels

    RegressionFunctionality that used to work in a prior pandas versionSubclassingSubclassing pandas objects

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions