Fix binary operation column ordering and missing column issues #13778

galipremsagar · 2023-07-27T19:38:20Z

Description

This PR fixes various cases in binary operations where columns are of certain dtypes and the binary operations on those dataframes and series don't yield correct results, correct resulting column types, or have missing columns altogether.
This PR also introduces ensuring column ordering to match pandas binary ops column ordering when pandas compatibility mode is enabled.

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

galipremsagar · 2023-07-27T20:27:53Z

Note that there is some intersection with changes in #13772

python/cudf/cudf/core/dataframe.py

python/cudf/cudf/core/indexed_frame.py

Co-authored-by: Bradley Dice <bdice@bradleydice.com>

wence-

I think this looks fine with one comment that we can potentially avoid one "expensive" check?

wence- · 2023-07-28T16:40:54Z

python/cudf/cudf/core/dataframe.py

+            equal_columns = other.index.to_pandas().equals(
+                self._data.to_pandas_index()
+            )
+            can_use_self_column_name = equal_columns or (
+                list(other._index._data.names) == self._data._level_names
+            )


Can we do the cheap thing first So that we don't convert to pandas and check equality if we don't need to?

We could change this to something like this:

if cudf.get_option("mode.pandas_compatible"): equal_columns = other.index.to_pandas().equals(self._data.to_pandas_index()) else: can_use_self_column_name = (list(other._index._data.names) == self._data._level_names) if not can_use_self_column_name: can_use_self_column_name = other.index.to_pandas().equals(self._data.to_pandas_index())

Switched to a cheaper condition first.

vyasr

LGTM

python/cudf/cudf/core/dataframe.py

galipremsagar · 2023-07-28T20:21:37Z

/merge

galipremsagar added 4 commits July 27, 2023 10:37

Fix binops and their column orderings

5f49fa8

Add tests

85a8c96

test improvements

0c53920

one more test

998253d

galipremsagar added bug Something isn't working 3 - Ready for Review Ready for review by team 4 - Needs cuDF (Python) Reviewer non-breaking Non-breaking change labels Jul 27, 2023

galipremsagar requested a review from a team as a code owner July 27, 2023 19:38

galipremsagar self-assigned this Jul 27, 2023

galipremsagar requested review from vyasr, charlesbluca and shwina July 27, 2023 19:38

github-actions bot added the Python Affects Python cuDF API. label Jul 27, 2023

bdice reviewed Jul 27, 2023

View reviewed changes

python/cudf/cudf/core/dataframe.py Show resolved Hide resolved

python/cudf/cudf/core/dataframe.py Outdated Show resolved Hide resolved

python/cudf/cudf/core/indexed_frame.py Outdated Show resolved Hide resolved

Apply suggestions from code review

e6f3cd0

Co-authored-by: Bradley Dice <bdice@bradleydice.com>

wence- approved these changes Jul 28, 2023

View reviewed changes

wence- mentioned this pull request Jul 28, 2023

Preserve names of column object in various APIs #13772

Merged

3 tasks

galipremsagar added 3 commits July 28, 2023 10:09

merge

f4c098e

Switch to more optimal condition check

fd50c21

Switch to more optimal condition check

10a2ee6

vyasr approved these changes Jul 28, 2023

View reviewed changes

galipremsagar added 5 - Ready to Merge Testing and reviews complete, ready to merge and removed 3 - Ready for Review Ready for review by team 4 - Needs cuDF (Python) Reviewer labels Jul 28, 2023

bdice reviewed Jul 28, 2023

View reviewed changes

python/cudf/cudf/core/dataframe.py Outdated Show resolved Hide resolved

Update dataframe.py

0b57672

bdice reviewed Jul 28, 2023

View reviewed changes

python/cudf/cudf/core/dataframe.py Outdated Show resolved Hide resolved

galipremsagar added 2 commits July 28, 2023 11:25

remove compatibility specific behavior

2af8448

fix

3711a54

bdice approved these changes Jul 28, 2023

View reviewed changes

Merge branch 'branch-23.10' into fix_binops

d3b9d7e

rapids-bot bot merged commit 70b8f1f into rapidsai:branch-23.10 Jul 28, 2023
54 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix binary operation column ordering and missing column issues #13778

Fix binary operation column ordering and missing column issues #13778

galipremsagar commented Jul 27, 2023

galipremsagar commented Jul 27, 2023

wence- left a comment

wence- Jul 28, 2023

galipremsagar Jul 28, 2023

galipremsagar Jul 28, 2023

vyasr left a comment

galipremsagar commented Jul 28, 2023

Fix binary operation column ordering and missing column issues #13778

Fix binary operation column ordering and missing column issues #13778

Conversation

galipremsagar commented Jul 27, 2023

Description

Checklist

galipremsagar commented Jul 27, 2023

wence- left a comment

Choose a reason for hiding this comment

wence- Jul 28, 2023

Choose a reason for hiding this comment

galipremsagar Jul 28, 2023

Choose a reason for hiding this comment

galipremsagar Jul 28, 2023

Choose a reason for hiding this comment

vyasr left a comment

Choose a reason for hiding this comment

galipremsagar commented Jul 28, 2023