-
-
Notifications
You must be signed in to change notification settings - Fork 18.6k
Request For Help: unexplained ArrowInvalid overflow #61776
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I assume that is indeed what is happening here, because there is in any case an (unfortunately long-standing) bug for exactly this case: apache/arrow#35088 (rereading the issue and based on Weston's comment, it seems the fix should actually be quite easy). A workaround might be to cast the duration to int64 (which should be zero-copy), and the the substract_checked kernel should work correctly. |
And so you can indeed see that the underlying values would overflow if the value masked by the null is not ignored:
|
What do you want to change here exactly? The issue is that pyarrow allows |
Seeing #61773, I understand the issue now (it's also related to the fact that we specify In the end, the reason that this overflow comes up in the tests because of this change is because in So one workaround would be to also fill the created pyarrow array with zeros. One potential way of doing this:
|
I eventually stumbled on that idea long after posting. Will give it a go in #61773. Thank you. |
Because of #61775 and to address failures in #61732 I'm trying out calling pd.to_datetime in ArrowEA._box_pa_array when we have a timestamp type. AFAICT this isn't breaking anything at construction-time (see the assertion this adds, which isn't failing in any tests). What is breaking is subsequent subtraction operations, that are raising
pyarrow.lib.ArrowInvalid: overflow
.It is happening on both sub and rsub ops. When I try operating with a subset of of the array it looks like the exception only happens when i use a slice that contains a null.
To examine the buffers, I added a breakpoint after the assertion in the diff. In the relevant case,
alt[8]
is null:So my current hypothesis is that when we get to the pc.subtract_checked call, it isn't skipping the iNaT entry despite the null bit, and the subtraction for that entry is overflowing. This seems likely unintentional and may be an upstream bug cc @jorisvandenbossche?
Regardless of if it is an upstream bug, I could use guidance on how to make the construction with to_datetime work. Filtering out Decimal(NaN) manually would be pretty inefficient.