-
-
Couldn't load subscription status.
- Fork 19.2k
Open
Labels
ExtensionArrayExtending pandas with custom dtypes or arrays.Extending pandas with custom dtypes or arrays.Needs DiscussionRequires discussion from core team before further actionRequires discussion from core team before further actionPerformanceMemory or execution speed performanceMemory or execution speed performance
Description
Is it worth dispatching is_monotonic_increasing / is_monotonic_decreasing for EAs?
The cython implemention is early-stopping, but the benefit disappears if the data needs to be copied into an object array as in the example below:
import pandas as pd
values = [f"val_{i:07}" for i in range(1_000_000)]
ser = pd.Series(values, dtype="string[pyarrow_numpy]")
%timeit ser.is_monotonic_increasing
# 219 ms ± 20.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit pc.all(pc.greater_equal(ser.array._pa_array[1:], ser.array._pa_array[:-1]))
# 19.2 ms ± 585 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
random order example with cython early-stopping:
ser2 = ser.sample(frac=1.0)
%timeit ser2.is_monotonic_increasing
# 152 ms ± 4.3 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit pc.all(pc.greater_equal(ser2.array._pa_array[1:], ser2.array._pa_array[:-1]))
# 15 ms ± 300 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Metadata
Metadata
Assignees
Labels
ExtensionArrayExtending pandas with custom dtypes or arrays.Extending pandas with custom dtypes or arrays.Needs DiscussionRequires discussion from core team before further actionRequires discussion from core team before further actionPerformanceMemory or execution speed performanceMemory or execution speed performance