Skip to content

Import statements in period.pyx significantly impact performance #12903

Closed
@rs2

Description

@rs2

The following imports in pandas/src/period.pyx significantly impact performance when dealing with multiple Period objects. A quick win, guys.

    def __init__(self, value=None, freq=None, ordinal=None,
                 year=None, month=1, quarter=None, day=1,
                 hour=0, minute=0, second=0):
        from pandas.tseries import frequencies
        from pandas.tseries.frequencies import get_freq_code as _gfc

        # freq points to a tuple (base, mult);  base is one of the defined
        # periods such as A, Q, etc. Every five minutes would be, e.g.,
        # ('T', 5) but may be passed in as a string like '5T'

Just profile the code below and observe the number of times _find_and_load gets called:

import pandas as pd

for _ in range(1000):
    pd.Period('2015-04-26')

bfa8066 is the commit that has introduced the problem.

I will submit a pull request that rectifies the incorrect fix to the circular dependency.

Metadata

Metadata

Assignees

No one assigned

    Labels

    PerformanceMemory or execution speed performancePeriodPeriod data type

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions