PERF: improve construct_1d_object_array_from_listlike #60461

jorisvandenbossche · 2024-12-01T12:52:41Z

This improved construct_1d_object_array_from_listlike, especially for the case where the objects inside the array like are itself array-likes with a potentially expensive conversion to numpy.
It seems that when doing result[:] = values, numpy will still check the __array__ method for each object in values, while when iterating and assigning the objects one by one, that does not happen.

And even in the case where __array__ is not expensive at all (or is absent), it seems that iterating is faster than the single assignment:

In [12]: class A:
    ...:     def __init__(self):
    ...:         self.data = np.random.randn(5)
    ...:     def __array__(self, dtype=None, copy=None):
    ...:         #print("calling __array__")
    ...:         return self.data

In [13]: N = 10_000

In [14]: data = [A() for _ in range(N)]

In [17]: %%timeit
    ...: arr = np.empty((N, ), dtype=object)
    ...: arr[:] = data
    ...: 
5.39 ms ± 33.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [18]: %%timeit
    ...: arr = np.empty((N, ), dtype=object)
    ...: for i, obj in enumerate(data):
    ...:     arr[i] = obj
    ...: 
424 µs ± 6.78 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

This is a useful performance improvement in general, I assume, but I am specifically doing it to fix the performance issue reported in #59657. That does is mostly for the 2.3.x branch, because that issue is avoided on main because of #57205 (avoid Series construction, which ends up calling construct_1d_object_array_from_listlike, in the first place)

closes PERF: Melt 2x slower when future.infer_string option enabled #59657
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added type annotations to new arguments/methods/functions.
Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

mroeschke · 2024-12-02T18:38:19Z

Some mypy errors, but nice find!

mypy.....................................................................Failed
- hook id: mypy
- duration: 88.21s
- exit code: 1

pandas/core/dtypes/cast.py:1604: error: Argument 1 to "len" has incompatible type "Iterable[Any]"; expected "Sized"  [arg-type]
pandas/core/common.py:256: error: Unused "type: ignore" comment  [unused-ignore]
Found 2 errors in 2 files (checked 1446 source files)

mroeschke · 2024-12-02T18:41:56Z

pandas/core/dtypes/cast.py

@@ -1602,7 +1602,8 @@ def construct_1d_object_array_from_listlike(values: Sized) -> np.ndarray:
    # numpy will try to interpret nested lists as further dimensions, hence
    # making a 1D array that contains list-likes is a bit tricky:
    result = np.empty(len(values), dtype="object")
-    result[:] = values
+    for i, obj in enumerate(values):


Any advantage of using np.fromiter(values, dtype="object", count=len(values))?

Ah, nice, wasn't aware of that. From a quick test that seems to be even a bit faster

jorisvandenbossche · 2024-12-02T20:15:49Z

Some mypy errors,

If I want something that is both Iterable and Sized, then that's Collection or Sequence ?

mroeschke · 2024-12-02T20:36:48Z

If I want something that is both Iterable and Sized, then that's Collection or Sequence ?

Appears either should work from the inheritance structure (Sequence inherits from Collection) https://docs.python.org/3/library/collections.abc.html#collections-abstract-base-classes

mroeschke · 2024-12-03T18:34:31Z

Thanks @jorisvandenbossche

…_from_listlike

…_array_from_listlike) (#60483) Backport PR #60461: PERF: improve construct_1d_object_array_from_listlike Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>

* PERF: improve construct_1d_object_array_from_listlike * use np.fromiter and update annotation

PERF: improve construct_1d_object_array_from_listlike

bda2878

jorisvandenbossche added Performance Memory or execution speed performance Constructors Series/DataFrame/Index/pd.array Constructors labels Dec 1, 2024

jorisvandenbossche added this to the 2.3 milestone Dec 1, 2024

This was referenced Dec 1, 2024

RLS: 2.3 #59664

Closed

PERF: Melt 2x slower when future.infer_string option enabled #59657

Closed

mroeschke reviewed Dec 2, 2024

View reviewed changes

use np.fromiter and update annotation

86a4b5f

mroeschke approved these changes Dec 3, 2024

View reviewed changes

mroeschke merged commit 8695401 into pandas-dev:main Dec 3, 2024
47 of 51 checks passed

meeseeksmachine pushed a commit to meeseeksmachine/pandas that referenced this pull request Dec 3, 2024

Backport PR pandas-dev#60461: PERF: improve construct_1d_object_array…

2df238e

…_from_listlike

meeseeksmachine mentioned this pull request Dec 3, 2024

Backport PR #60461 on branch 2.3.x (PERF: improve construct_1d_object_array_from_listlike) #60483

Merged

jorisvandenbossche deleted the construction-1d-object branch December 3, 2024 20:51

TendouArisu mentioned this pull request Feb 16, 2025

Potential performance issue: Unreliable performance of df.melt in pandas 2.2.2 GeoscienceAustralia/dea-intertidal#103

Closed

KevsterAmp pushed a commit to KevsterAmp/pandas that referenced this pull request Mar 12, 2025

PERF: improve construct_1d_object_array_from_listlike (pandas-dev#60461)

eaaaa42

* PERF: improve construct_1d_object_array_from_listlike * use np.fromiter and update annotation

This was referenced Apr 4, 2025

BUG: Constructing dtype=object makes useless, slow calls to __array__ numpy/numpy#28651

Open

Pandas Series Construction Extremely Slow for Array of Large Series #25364

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

PERF: improve construct_1d_object_array_from_listlike #60461

PERF: improve construct_1d_object_array_from_listlike #60461

Uh oh!

jorisvandenbossche commented Dec 1, 2024 •

edited

Loading

Uh oh!

mroeschke commented Dec 2, 2024

Uh oh!

mroeschke Dec 2, 2024

Uh oh!

jorisvandenbossche Dec 2, 2024

Uh oh!

jorisvandenbossche commented Dec 2, 2024

Uh oh!

mroeschke commented Dec 2, 2024

Uh oh!

Uh oh!

mroeschke commented Dec 3, 2024

Uh oh!

Uh oh!

Uh oh!

PERF: improve construct_1d_object_array_from_listlike #60461

PERF: improve construct_1d_object_array_from_listlike #60461

Uh oh!

Conversation

jorisvandenbossche commented Dec 1, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mroeschke commented Dec 2, 2024

Uh oh!

mroeschke Dec 2, 2024

Choose a reason for hiding this comment

Uh oh!

jorisvandenbossche Dec 2, 2024

Choose a reason for hiding this comment

Uh oh!

jorisvandenbossche commented Dec 2, 2024

Uh oh!

mroeschke commented Dec 2, 2024

Uh oh!

Uh oh!

mroeschke commented Dec 3, 2024

Uh oh!

Uh oh!

jorisvandenbossche commented Dec 1, 2024 •

edited

Loading