-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Signed integer overflow occurred during constant-folding. Signed integer overflow for int32 and int64 is undefined behavior in Halide. #8227
Comments
Investigating As an aside: Halide indexing is col major, which is the convention in math for functions (not matrices), so you may want to be saying tmp[xindex, yindex]. We use this convention because it's familiar to people used to graphics shaders, and because Halide Funcs are intended to be thought of as functions over an infinite integer domain, not matrices. |
Thanks! The yindex/xindex naming is artifact of |
I raised it because it would be a source of inefficiency if tmp is ever compute_at somewhere. Ok, so this is an instance of #3245. What's happening is that the extent required of the input as a function of the extent of the output is:
When N = 5794 this becomes:
Halide then "cleverly" simplifies this to:
which overflows a signed integer. Even for 5793 that's a dumb simplification, because extent being large could easily cause overflow at runtime instead. As I say in #3245, the simplifier should not be allowed to introduce new signed integer overflow, but we haven't been sufficiently strict about that because it's really convenient to pretend that our indices are infinite precision integers. I'll try to figure out why we need this rule, and if there's an alternative |
Makes sense. I am hitting this error somewhat frequently, so a fix would be very helpful! If there is a way to get 64-bit indexing that might also fix it (and help in the case where we have >2gb tensors -- which happens for large embedding tables). In this case For our Halide backend, I'm currently mapping input/outputs to 1D arrays since the indexing/view support in PyTorch is more flexible than Halide, which makes it easier to express things as 1D buffers plus indexing formulas. With views in PyTorch (or ops like I'm creating the 2D I'd love to chat more about if there are better ways to do this mapping, and I'm happy to collaborate since I'm sure PyTorch will expose some areas for improvement in Halide. I had reached to @jrk about this a while back, since I worked closer with him when we were at MIT. |
Fixes #8227 but may break other things. Needs thorough testing. Also, there are more rules like this lurking.
repro.py:
Throws the following error when run:
However, this passes if I reduce
N
by 1:In my original program
N
was larger, but I reduced it to find the error threshold. I observed this error in a much larger program generated bytorch.compile
, but minified it to find the smallest reproducer for the error.I am a bit surprised by this, because
5794*64
is nowhere close to INT32_MAX, and I am only dealing with buffers of a few MB.The text was updated successfully, but these errors were encountered: