Closed
Description
Code Sample
from datetime import datetime
import numpy as np
import pandas as pd
from pandas.api.types import infer_dtype
print(pd.__version__)
RUN_COUNT = 5
df = pd.DataFrame(np.ones((100000, 1000)))
avg_times = []
for _ in range(RUN_COUNT):
start = datetime.now()
for col in df.columns:
infer_dtype(df[col], skipna=True)
avg_times.append(datetime.now() - start)
print('Average time: ', np.mean(avg_times))
Problem description
When I run the above code on my machine, there is major difference in the performance between different versions. I ran the code using new conda environments created with Python 3.7 and the specific version of Pandas. The table below shows the execution times.
Pandas Version | Execution time |
---|---|
0.23.4 | 0:00:00.013999 |
0.25.1 | 0:00:15.623905 |
The example code is only an example. In the most cases I need this function to get more specific type of the column than the type returned by the .dtype parameter (usually when working with mixed data).