Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: fix AttributeError raised with pd.concat between a None and timezone-aware Timestamp #54428

Merged
merged 22 commits into from
Oct 23, 2023

Conversation

yuanx749
Copy link
Contributor

@yuanx749 yuanx749 commented Aug 5, 2023

Take over #53042. I updated the test. It should produce a FutureWarning.

@mroeschke mroeschke requested a review from jbrockmendel August 7, 2023 16:56
@mroeschke mroeschke added Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Reshaping Concat, Merge/Join, Stack/Unstack, Explode Timezones Timezone data dtype labels Aug 7, 2023
@@ -2274,7 +2275,7 @@ def _preprocess_slice_or_indexer(
def make_na_array(dtype: DtypeObj, shape: Shape, fill_value) -> ArrayLike:
if isinstance(dtype, DatetimeTZDtype):
# NB: exclude e.g. pyarrow[dt64tz] dtypes
i8values = np.full(shape, fill_value._value)
i8values = np.full(shape, Timestamp(fill_value)._value)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to be sure that the Timestamp here has the same unit as the dtype

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the unit.

Is this function only for NA fill_value? Then it seems unit has no effect since NaT does not have unit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its always NA for concat, but we can get here with non-NA via reindex with fill_vlaue

@yuanx749
Copy link
Contributor Author

I notice something weird in the test for reindex with fill_value below.

def test_reindex_tzaware_fill_value(self):
# GH#52586
df = DataFrame([[1]])
ts = pd.Timestamp("2023-04-10 17:32", tz="US/Pacific")
res = df.reindex([0, 1], axis=1, fill_value=ts)
assert res.dtypes[1] == pd.DatetimeTZDtype(unit="s", tz="US/Pacific")
expected = DataFrame({0: [1], 1: [ts]})
expected[1] = expected[1].astype(res.dtypes[1])
tm.assert_frame_equal(res, expected)

Though the test passes, the representation seems weird. print(res) shows,

   0                                   1
0  1 1969-12-31 16:00:01.681173120-08:00

while it should be as print(expected):

   0                         1
0  1 2023-04-10 17:32:00-07:00

It seems DatetimeArray only works when unit is 'ns'. Could you have a further look? @jbrockmendel

def make_na_array(dtype: DtypeObj, shape: Shape, fill_value) -> ArrayLike:
if isinstance(dtype, DatetimeTZDtype):
# NB: exclude e.g. pyarrow[dt64tz] dtypes
i8values = np.full(shape, fill_value._value)
return DatetimeArray(i8values, dtype=dtype)

@yuanx749 yuanx749 requested a review from jbrockmendel August 21, 2023 14:48
@github-actions
Copy link
Contributor

This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this.

@github-actions github-actions bot added the Stale label Oct 11, 2023
@mroeschke
Copy link
Member

Thanks for the pull request, but it appears to have gone stale. If interested in continuing, please merge in the main branch, address any review comments and/or failing tests, and we can reopen.

@mroeschke mroeschke closed this Oct 22, 2023
@yuanx749
Copy link
Contributor Author

Still waiting for requested review, any update needed from my side?

@jbrockmendel jbrockmendel reopened this Oct 23, 2023
@jbrockmendel
Copy link
Member

This looks good to me. can you move the whatsnew note to 2.2

@yuanx749
Copy link
Contributor Author

This looks good to me. can you move the whatsnew note to 2.2

Thanks. I've moved the note.

@mroeschke mroeschke merged commit 8b7fd0d into pandas-dev:main Oct 23, 2023
@mroeschke
Copy link
Member

Thanks @yuanx749

@yuanx749 yuanx749 deleted the concat-none-dt branch October 24, 2023 02:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Reshaping Concat, Merge/Join, Stack/Unstack, Explode Stale Timezones Timezone data dtype
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: AttributeError raised with pd.concat between a None and Timestamp
4 participants