Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix binary operation column ordering and missing column issues #13778

Merged
merged 12 commits into from
Jul 28, 2023

Conversation

galipremsagar
Copy link
Contributor

Description

This PR fixes various cases in binary operations where columns are of certain dtypes and the binary operations on those dataframes and series don't yield correct results, correct resulting column types, or have missing columns altogether.
This PR also introduces ensuring column ordering to match pandas binary ops column ordering when pandas compatibility mode is enabled.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@galipremsagar galipremsagar added bug Something isn't working 3 - Ready for Review Ready for review by team 4 - Needs cuDF (Python) Reviewer non-breaking Non-breaking change labels Jul 27, 2023
@galipremsagar galipremsagar requested a review from a team as a code owner July 27, 2023 19:38
@galipremsagar galipremsagar self-assigned this Jul 27, 2023
@github-actions github-actions bot added the Python Affects Python cuDF API. label Jul 27, 2023
@galipremsagar
Copy link
Contributor Author

Note that there is some intersection with changes in #13772

python/cudf/cudf/core/dataframe.py Show resolved Hide resolved
python/cudf/cudf/core/dataframe.py Outdated Show resolved Hide resolved
python/cudf/cudf/core/indexed_frame.py Outdated Show resolved Hide resolved
Co-authored-by: Bradley Dice <bdice@bradleydice.com>
Copy link
Contributor

@wence- wence- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks fine with one comment that we can potentially avoid one "expensive" check?

Comment on lines 1840 to 1845
equal_columns = other.index.to_pandas().equals(
self._data.to_pandas_index()
)
can_use_self_column_name = equal_columns or (
list(other._index._data.names) == self._data._level_names
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do the cheap thing first So that we don't convert to pandas and check equality if we don't need to?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could change this to something like this:

if cudf.get_option("mode.pandas_compatible"):
    equal_columns = other.index.to_pandas().equals(self._data.to_pandas_index())
else:
    can_use_self_column_name = (list(other._index._data.names) == self._data._level_names)
    if not can_use_self_column_name:
        can_use_self_column_name = other.index.to_pandas().equals(self._data.to_pandas_index())

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched to a cheaper condition first.

Copy link
Contributor

@vyasr vyasr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@galipremsagar galipremsagar added 5 - Ready to Merge Testing and reviews complete, ready to merge and removed 3 - Ready for Review Ready for review by team 4 - Needs cuDF (Python) Reviewer labels Jul 28, 2023
@galipremsagar
Copy link
Contributor Author

/merge

@rapids-bot rapids-bot bot merged commit 70b8f1f into rapidsai:branch-23.10 Jul 28, 2023
54 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to Merge Testing and reviews complete, ready to merge bug Something isn't working non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

4 participants