Skip to content

Possible performance regression in Series.xs() and DataFrame.xs() with MultiIndex. #35188

Closed
@taozuoqiao

Description

@taozuoqiao

Performance of Series.xs() when getting values at specified index and the first level of a large dataset seems to have regressed significantly, after pandas was upgraded to v1.0.5 from v0.25.3. So is DataFrame.xs().

code:

import numpy as np
import pandas as pd
n=1e4
data = pd.Series(np.arange(n**2), index=pd.MultiIndex.from_product([np.arange(n), -np.arange(n)]))
%timeit data.xs(100, level=0) #first level
%timeit data.xs(-100, level=1) #second level

outcome

## 0.25.3
%timeit data.xs(100, level=0)
571 µs ± 1.79 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit data.xs(-100, level=1)
193 ms ± 947 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)


## 1.0.5
%timeit data.xs(100, level=0)
683 ms ± 3.35 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)  

%timeit data.xs(-100, level=1)
197 ms ± 2.42 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) 

Metadata

Metadata

Assignees

No one assigned

    Labels

    Closing CandidateMay be closeable, needs more eyeballsIndexingRelated to indexing on series/frames, not to indexes themselvesMultiIndexPerformanceMemory or execution speed performanceRegressionFunctionality that used to work in a prior pandas version

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions