Closed
Description
Just found out a performance issue with pd.Series.map
, seems it is very slow when the input is a huge dictionary.
I noticed a similar issue reported before: #21278 and indeed for Series
input, the first run might be slow and then for the later runs, they are very fast because hashable indexing is built. However, it doesn't seem to apply to dict
input.
I slightly changed the example in #21278, and the runtime doesn't change if being run multiple times. And it is much faster using apply
and dict.get
.
So I am curious if this performance issue is being aware , and i would expect performance when a dict is assigned between pd.Series.map
and pd.Series.apply(lambda x: blabla)
is quite similar.
n = 1000000
domain = np.arange(0, n)
ranges = domain+10
maptable = pd.Series(ranges, index=domain).sort_index().to_dict()
query_vals = pd.Series([1,2,3])
%timeit query_vals.map(maptable)
while much faster if doing below:
query_vals.apply(lambda x: maptable.get(x))