Skip to content

infer_dtype() function slower in latest version #28814

Closed
@borisaltanov

Description

@borisaltanov

Code Sample

from datetime import datetime

import numpy as np
import pandas as pd
from pandas.api.types import infer_dtype

print(pd.__version__)

RUN_COUNT = 5

df = pd.DataFrame(np.ones((100000, 1000)))

avg_times = []

for _ in range(RUN_COUNT):

    start = datetime.now()

    for col in df.columns:
        infer_dtype(df[col], skipna=True)

    avg_times.append(datetime.now() - start)

print('Average time: ', np.mean(avg_times))

Problem description

When I run the above code on my machine, there is major difference in the performance between different versions. I ran the code using new conda environments created with Python 3.7 and the specific version of Pandas. The table below shows the execution times.

Pandas Version Execution time
0.23.4 0:00:00.013999
0.25.1 0:00:15.623905

The example code is only an example. In the most cases I need this function to get more specific type of the column than the type returned by the .dtype parameter (usually when working with mixed data).

Metadata

Metadata

Assignees

No one assigned

    Labels

    PerformanceMemory or execution speed performance

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions