Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Raise error when trying to join datetime and timedelta types with other types #13786

Merged

Conversation

galipremsagar
Copy link
Contributor

Description

This PR fixes an issue when trying to merge a datetime|timdelta type column with another type:

In [1]: import cudf

In [2]: import pandas as pd

In [3]: df = cudf.DataFrame({'a': cudf.Series([10, 20, 30], dtype='datetime64[ns]')})

In [4]: df2 = df.astype('int')

In [5]: df.merge(df2)
Out[5]: 
      a
0  10.0
1  20.0
2  30.0

In [6]: df2.merge(df)
Out[6]: 
      a
0  10.0
1  20.0
2  30.0

In [7]: df
Out[7]: 
                              a
0 1970-01-01 00:00:00.000000010
1 1970-01-01 00:00:00.000000020
2 1970-01-01 00:00:00.000000030

In [8]: df2
Out[8]: 
    a
0  10
1  20
2  30

In [9]: df.to_pandas().merge(df2.to_pandas())
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[9], line 1
----> 1 df.to_pandas().merge(df2.to_pandas())

File /nvme/0/pgali/envs/cudfdev/lib/python3.10/site-packages/pandas/core/frame.py:10092, in DataFrame.merge(self, right, how, on, left_on, right_on, left_index, right_index, sort, suffixes, copy, indicator, validate)
  10073 @Substitution("")
  10074 @Appender(_merge_doc, indents=2)
  10075 def merge(
   (...)
  10088     validate: str | None = None,
  10089 ) -> DataFrame:
  10090     from pandas.core.reshape.merge import merge
> 10092     return merge(
  10093         self,
  10094         right,
  10095         how=how,
  10096         on=on,
  10097         left_on=left_on,
  10098         right_on=right_on,
  10099         left_index=left_index,
  10100         right_index=right_index,
  10101         sort=sort,
  10102         suffixes=suffixes,
  10103         copy=copy,
  10104         indicator=indicator,
  10105         validate=validate,
  10106     )

File /nvme/0/pgali/envs/cudfdev/lib/python3.10/site-packages/pandas/core/reshape/merge.py:110, in merge(left, right, how, on, left_on, right_on, left_index, right_index, sort, suffixes, copy, indicator, validate)
     93 @Substitution("\nleft : DataFrame or named Series")
     94 @Appender(_merge_doc, indents=0)
     95 def merge(
   (...)
    108     validate: str | None = None,
    109 ) -> DataFrame:
--> 110     op = _MergeOperation(
    111         left,
    112         right,
    113         how=how,
    114         on=on,
    115         left_on=left_on,
    116         right_on=right_on,
    117         left_index=left_index,
    118         right_index=right_index,
    119         sort=sort,
    120         suffixes=suffixes,
    121         indicator=indicator,
    122         validate=validate,
    123     )
    124     return op.get_result(copy=copy)

File /nvme/0/pgali/envs/cudfdev/lib/python3.10/site-packages/pandas/core/reshape/merge.py:707, in _MergeOperation.__init__(self, left, right, how, on, left_on, right_on, axis, left_index, right_index, sort, suffixes, indicator, validate)
    699 (
    700     self.left_join_keys,
    701     self.right_join_keys,
    702     self.join_names,
    703 ) = self._get_merge_keys()
    705 # validate the merge keys dtypes. We may need to coerce
    706 # to avoid incompatible dtypes
--> 707 self._maybe_coerce_merge_keys()
    709 # If argument passed to validate,
    710 # check if columns specified as unique
    711 # are in fact unique.
    712 if validate is not None:

File /nvme/0/pgali/envs/cudfdev/lib/python3.10/site-packages/pandas/core/reshape/merge.py:1344, in _MergeOperation._maybe_coerce_merge_keys(self)
   1342 # datetimelikes must match exactly
   1343 elif needs_i8_conversion(lk.dtype) and not needs_i8_conversion(rk.dtype):
-> 1344     raise ValueError(msg)
   1345 elif not needs_i8_conversion(lk.dtype) and needs_i8_conversion(rk.dtype):
   1346     raise ValueError(msg)

ValueError: You are trying to merge on datetime64[ns] and int64 columns. If you wish to proceed you should use pd.concat

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@galipremsagar galipremsagar added bug Something isn't working 3 - Ready for Review Ready for review by team Python Affects Python cuDF API. 4 - Needs cuDF (Python) Reviewer breaking Breaking change labels Jul 31, 2023
@galipremsagar galipremsagar self-assigned this Jul 31, 2023
@galipremsagar galipremsagar requested a review from a team as a code owner July 31, 2023 12:58
Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine to me. Do we have tests for all the cases that are newly handled, though? It seems like there is more new code than accompanying tests.

python/cudf/cudf/core/join/_join_helpers.py Outdated Show resolved Hide resolved
python/cudf/cudf/core/join/_join_helpers.py Outdated Show resolved Hide resolved
@galipremsagar
Copy link
Contributor Author

Looks fine to me. Do we have tests for all the cases that are newly handled, though? It seems like there is more new code than accompanying tests.

Yup, thanks for catching it. Added additional tests - Now we should be having full coverage.

@galipremsagar galipremsagar added 5 - Ready to Merge Testing and reviews complete, ready to merge and removed 3 - Ready for Review Ready for review by team 4 - Needs cuDF (Python) Reviewer labels Jul 31, 2023
@galipremsagar
Copy link
Contributor Author

@bdice Thanks for the quick review. I'm going ahead and merging this PR. If you think we need more testing coverage let me know I can get that taken care as a separate PR.

@galipremsagar
Copy link
Contributor Author

/merge

@rapids-bot rapids-bot bot merged commit 02357b1 into rapidsai:branch-23.10 Jul 31, 2023
54 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to Merge Testing and reviews complete, ready to merge breaking Breaking change bug Something isn't working Python Affects Python cuDF API.
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

2 participants