GH-49168: [Python] map date32/date64 to datetime64[s] in to_pandas_dtype#49210
GH-49168: [Python] map date32/date64 to datetime64[s] in to_pandas_dtype#49210PratyushD21 wants to merge 2 commits intoapache:mainfrom
Conversation
|
Thanks for opening a pull request! If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project. Then could you also rename the pull request title in the following format? or See also: |
|
What about datetime64[D], wouldn’t that be more appropriate? |
|
@PratyushD21 good point, thanks. @mroeschke will know it the best, I remember pandas also changed how it handles days since 2023. |
Yeah pandas does not natively support storing The changes look good, but I don't have permissions to run the CI in Arrow. You may want to build Arrow/PyArrow from source and run the test suite to see if other tests passes, especially the parquet tests based on the history in #35656 |
|
Hi @mroeschke, I built from Source, and ran the test suite (especially the parquet tests). The tests have passed. |
|
I kicked off the CI, ping me if you need to start it again. |
|
@rok would it be possible to re-trigger the CI? I have added additional check for pandas<=2. Now, the CIs should pass. |
|
|
|
I don't think we should change this for |
raulcd
left a comment
There was a problem hiding this comment.
I am sorry but I don't understand this. Can someone explain me why returning s is better than ms in this specific case? I don't use this specific API and I might be missing something obvious (again no expert on this area) but why this change is preferred and not just a matter of preference?
Ah sure, I think there should be consistency in both APIs. For Conversely it appears |
For |
OK that's reasonable. I understand if ultimately the change is not made; we just found it unexpected in cuDF |
|
Hi @mroeschke @jorisvandenbossche, if I understand correctly, we would like to perform the same change for |
Rationale for this change
Fixes #49168
What changes are included in this PR?
As per the issue raised in [Python] Consider pa.date32/64.to_pandas_dtype() returning datetime64[s] instead of datetime64[ms], it was observed that no
mstimestamp was supported in parquet. To rectify that, thepa.Date32/64.to_pandas_dtypewill returndatetime64[s].python/pyarrow/types.pxi
python/pyarrow/tests/test_schema.py
Are these changes tested?
Yes, the schema test file
python/pyarrow/tests/test_schema.pyhas been updated to include this change.Are there any user-facing changes?
Yes, the return types
pa.date64/32().to_pandas_dtype()should returndatetime64[s].This PR includes breaking changes to public APIs. (If there are any breaking changes to public APIs, please explain which changes are breaking. If not, you can remove this.)
pa.date32/64.to_pandas_dtype()returningdatetime64[s]instead ofdatetime64[ms]#49168