Skip to content

Series.all much slower than Series.values.all #26032

Closed
@ericstarr

Description

@ericstarr

Code Sample, a copy-pastable example if possible

# Series of bools

s = pd.Series(np.random.randint(0, 2, 100000)).astype(bool)

# ~1.45 ms
%timeit s.any(skipna=True)

# ~1.35 ms
%timeit s.any(skipna=False)

# ~6.5 us - Note that I get a message about possible caching, but
# even after multiplying by worst case multiplier, still an order of
# magnitude faster than s.any()
%timeit s.values.any()


# Series of ints

s2 = pd.Series(np.random.randint(0, 2, 100000))

# ~330 us
%timeit s2.any(skipna=True)

# ~280 us
%timeit s2.any(skipna=False)

# ~90 us - No possible caching warning on this one
%timeit s2.values.any()

Problem description

Calling Series.any is much slower than calling Series.values.any on a series of bools
Interestingly, calling Series.any on a series of ints is quite a bit faster than on a series of bools, though even if it is a series of ints, Series.values.any is still faster.

I ran with both skipna=True and skipna=False in case it was an issue of how NaNs are being handled.

I see the same time differences with Series.all

Expected Output

I would expect the performance to be comparable. Maybe not exactly the same,, but not order(s) of magnitude slower.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.5.4.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 45 Stepping 7, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.23.3
pytest: 3.3.0
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.25.1
numpy: 1.11.1
scipy: 0.18.0
pyarrow: None
xarray: None
IPython: 5.1.0
sphinx: 1.4.1
patsy: 0.4.1
dateutil: 2.5.3
pytz: 2016.6.1
blosc: None
bottleneck: None
tables: 3.2.2
numexpr: 2.6.1
feather: None
matplotlib: 1.5.1
openpyxl: 2.5.6
xlrd: 1.0.0
xlwt: 1.1.2
xlsxwriter: 0.9.2
lxml: 3.6.4
bs4: 4.5.1
html5lib: 0.9999999
sqlalchemy: 1.0.13
pymysql: None
psycopg2: 2.7.1 (dt dec pq3 ext lo64)
jinja2: 2.8
s3fs: 0.0.8
fastparquet: None
pandas_gbq: None
pandas_datareader: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    PerformanceMemory or execution speed performance

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions