PERF: faster _coerce_to_data_and_mask() for astype("Float64") #60121

auderson · 2024-10-29T03:26:39Z

closes PERF: df.astype("float64[pyarrow]") is slow, df.astype("Float64") is super slow #60066
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added type annotations to new arguments/methods/functions.
Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

Use np.isnan for floats instead of libmissing.is_numeric_na. Also add a fast path for booleans

Prev:

New:

auderson · 2024-10-29T05:03:46Z

Failures seem unrelated: FAILED pandas/tests/io/test_parquet.py::TestParquetPyArrow::test_timezone_aware_index[timezone_aware_date_list1] - AssertionError: DataFrame.index are different

auderson · 2024-10-29T06:23:01Z

The results from asv don't seem as impressive as the results above, guessing the scale of these tests is much smaller.

auderson · 2024-10-30T05:23:53Z

@rhshadrach friendly ping

jorisvandenbossche

This looks good to me, thanks a lot @auderson!

mroeschke · 2024-10-30T18:59:58Z

pandas/core/arrays/numeric.py

+        elif values.dtype.kind == "f":
+            # np.isnan is faster than is_numeric_na() for floats
+            # github issue: #60066
+            mask = np.isnan(values)


Could we ever get a masked float values here with pd.NA?

There is a

if not copy: values = np.asarray(values) else: values = np.array(values, copy=copy)

above, so values is guaranteed to be a numpy array at this point

mroeschke · 2024-10-30T19:31:24Z

Thanks @auderson

auderson added 3 commits October 29, 2024 10:56

add fast path in _coerce_to_data_and_mask

757de8c

update whatsnew

4483edc

pre-commit

c2fa929

jorisvandenbossche added Performance Memory or execution speed performance NA - MaskedArrays Related to pd.NA and nullable extension arrays labels Oct 30, 2024

jorisvandenbossche approved these changes Oct 30, 2024

View reviewed changes

jorisvandenbossche added this to the 3.0 milestone Oct 30, 2024

mroeschke reviewed Oct 30, 2024

View reviewed changes

mroeschke approved these changes Oct 30, 2024

View reviewed changes

mroeschke merged commit 00d4189 into pandas-dev:main Oct 30, 2024
56 of 57 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

PERF: faster _coerce_to_data_and_mask() for astype("Float64") #60121

PERF: faster _coerce_to_data_and_mask() for astype("Float64") #60121

Uh oh!

auderson commented Oct 29, 2024 •

edited

Loading

Uh oh!

auderson commented Oct 29, 2024

Uh oh!

auderson commented Oct 29, 2024 •

edited

Loading

Uh oh!

auderson commented Oct 30, 2024

Uh oh!

jorisvandenbossche left a comment

Uh oh!

mroeschke Oct 30, 2024

Uh oh!

jorisvandenbossche Oct 30, 2024

Uh oh!

Uh oh!

mroeschke commented Oct 30, 2024

Uh oh!

Uh oh!

Uh oh!

PERF: faster _coerce_to_data_and_mask() for astype("Float64") #60121

PERF: faster _coerce_to_data_and_mask() for astype("Float64") #60121

Uh oh!

Conversation

auderson commented Oct 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

auderson commented Oct 29, 2024

Uh oh!

auderson commented Oct 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

auderson commented Oct 30, 2024

Uh oh!

jorisvandenbossche left a comment

Choose a reason for hiding this comment

Uh oh!

mroeschke Oct 30, 2024

Choose a reason for hiding this comment

Uh oh!

jorisvandenbossche Oct 30, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mroeschke commented Oct 30, 2024

Uh oh!

Uh oh!

auderson commented Oct 29, 2024 •

edited

Loading

auderson commented Oct 29, 2024 •

edited

Loading