-
Notifications
You must be signed in to change notification settings - Fork 908
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix binary operation column ordering and missing column issues #13778
Conversation
Note that there is some intersection with changes in #13772 |
Co-authored-by: Bradley Dice <bdice@bradleydice.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks fine with one comment that we can potentially avoid one "expensive" check?
python/cudf/cudf/core/dataframe.py
Outdated
equal_columns = other.index.to_pandas().equals( | ||
self._data.to_pandas_index() | ||
) | ||
can_use_self_column_name = equal_columns or ( | ||
list(other._index._data.names) == self._data._level_names | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we do the cheap thing first So that we don't convert to pandas and check equality if we don't need to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could change this to something like this:
if cudf.get_option("mode.pandas_compatible"):
equal_columns = other.index.to_pandas().equals(self._data.to_pandas_index())
else:
can_use_self_column_name = (list(other._index._data.names) == self._data._level_names)
if not can_use_self_column_name:
can_use_self_column_name = other.index.to_pandas().equals(self._data.to_pandas_index())
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switched to a cheaper condition first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/merge |
Description
This PR fixes various cases in binary operations where columns are of certain dtypes and the binary operations on those dataframes and series don't yield correct results, correct resulting column types, or have missing columns altogether.
This PR also introduces ensuring column ordering to match pandas binary ops column ordering when pandas compatibility mode is enabled.
Checklist