Closed
Description
1 million x 3 dataframe, using numexpr takes 142ms to do multiplication, with numexpr disabled, takes 21.6ms. I'm going to investigate, but would be helpful to know if you can reproduce this anywhere else.
In [10]: df = pd.DataFrame({"A": np.arange(1000000), "B": np.arange(1000000, 0, -1), "C": np.random.randn(1000000)})
In [11]: df * df
Out[11]:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 1000000 entries, 0 to 999999
Data columns (total 3 columns):
A 1000000 non-null values
B 1000000 non-null values
C 1000000 non-null values
dtypes: float64(1), int64(2)
In [12]: %timeit df * df
10 loops, best of 3: 142 ms per loop
In [13]: pd.computation.expressions.set_use_numexpr(False)
In [14]: %timeit df * df
10 loops, best of 3: 21.6 ms per loop
In [15]: pd.computation.expressions.set_use_numexpr(True)
In [16]: %timeit df * df
10 loops, best of 3: 141 ms per loop
In [17]: len(df)
Out[17]: 1000000