Skip to content

PERF: pd.Series.map too slow for huge dictionary #34717

Closed
@charlesdong1991

Description

@charlesdong1991

Just found out a performance issue with pd.Series.map, seems it is very slow when the input is a huge dictionary.

I noticed a similar issue reported before: #21278 and indeed for Series input, the first run might be slow and then for the later runs, they are very fast because hashable indexing is built. However, it doesn't seem to apply to dict input.

I slightly changed the example in #21278, and the runtime doesn't change if being run multiple times. And it is much faster using apply and dict.get.

So I am curious if this performance issue is being aware , and i would expect performance when a dict is assigned between pd.Series.map and pd.Series.apply(lambda x: blabla) is quite similar.

n = 1000000
domain = np.arange(0, n)
ranges = domain+10
maptable = pd.Series(ranges, index=domain).sort_index().to_dict()
query_vals = pd.Series([1,2,3])

%timeit query_vals.map(maptable)

while much faster if doing below:

query_vals.apply(lambda x: maptable.get(x))

Metadata

Metadata

Assignees

No one assigned

    Labels

    PerformanceMemory or execution speed performance

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions