Skip to content

ENH: try to preserve the dtype on combine_first for the case where the two DataFrame objects have the same columns #39051

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Jan 15, 2021
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
fix isort and flake8 errors
  • Loading branch information
danielhrisca committed Jan 11, 2021
commit 198eaa4ad3b23810dbb5a0f78934e1365c66a983
8 changes: 7 additions & 1 deletion pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -6488,14 +6488,20 @@ def combiner(x, y):

for col in self.columns.intersection(other.columns):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be a simple list-comprehension

try:
# if the column has different dtype in the
# DataFrame objects then add the common dtype
# to the columns dtype conversion dict
if combined.dtypes[col] != self.dtypes[col]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use is_dtype_equal here

dtypes[col] = find_common_type(
[self.dtypes[col], other.dtypes[col]]
)
except TypeError:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we do not want to do multiple try/excepts ever as these tend to hide errors.
in fact you should not need here at all. find_common_type will always succeed (it could of course be object).

# numpy dtype was compared with pandas dtype
try:
# just try to apply the initial column dtype
combined[col] = combined[col].astype(self.dtypes[col])
except:
except ValueError:
# could not apply the initial dtype, so skip
pass

if dtypes:
Expand Down
3 changes: 2 additions & 1 deletion pandas/tests/frame/methods/test_combine_first.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,11 @@
import numpy as np
import pytest

from pandas.core.dtypes.cast import find_common_type

import pandas as pd
from pandas import DataFrame, Index, MultiIndex, Series
import pandas._testing as tm
from pandas.core.dtypes.cast import find_common_type


class TestDataFrameCombineFirst:
Expand Down