Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Signed integer overflow occurred during constant-folding. Signed integer overflow for int32 and int64 is undefined behavior in Halide. #8227

Closed
jansel opened this issue May 21, 2024 · 4 comments · Fixed by #8234
Labels
dev_meeting Topic to be discussed at the next dev meeting

Comments

@jansel
Copy link
Contributor

jansel commented May 21, 2024

repro.py:

import halide as hl
import sys
import tempfile

N = 5794


@hl.generator(name="kernel")
class Kernel:
    in_ptr0 = hl.InputBuffer(hl.Float(32), 1)
    out_ptr0 = hl.OutputBuffer(hl.Float(32), 1)

    def generate(g):
        in_ptr0 = g.in_ptr0
        out_ptr0 = g.out_ptr0
        xindex = hl.Var("xindex")
        yindex = hl.Var("yindex")
        tmp0 = hl.Func()
        tmp0[yindex, xindex] = in_ptr0[yindex + (64 * xindex)]
        out_ptr0_index = hl.Var("out_ptr0_index")
        out_ptr0[out_ptr0_index] = tmp0[
            out_ptr0_index // N, out_ptr0_index % N
        ]

        # same issue with autoscheduler enabled
        # in_ptr0.set_estimates([hl.Range(0, 1914624)])
        # out_ptr0.set_estimates([hl.Range(0, 1914624)])


with tempfile.TemporaryDirectory() as outdir:
    sys.argv = [
        "repro.py",
        "-g", "kernel",
        "-o", outdir,
        "-f", "halide_kernel",
        "-e", "static_library,h,schedule",
        "target=host-large_buffers-debug",
    ]
    hl.main()

Throws the following error when run:

Unhandled exception: Error: Signed integer overflow occurred during constant-folding. Signed integer overflow for int32 and int64 is undefined behavior in Halide.

Traceback (most recent call last):
  File "/home/jansel/pytorch/repro.py", line 39, in <module>
    hl.main()
RuntimeError: Generator failed: -1

However, this passes if I reduce N by 1:

N = 5793

In my original program N was larger, but I reduced it to find the error threshold. I observed this error in a much larger program generated by torch.compile, but minified it to find the smallest reproducer for the error.

I am a bit surprised by this, because 5794*64 is nowhere close to INT32_MAX, and I am only dealing with buffers of a few MB.

@abadams
Copy link
Member

abadams commented May 23, 2024

Investigating

As an aside: Halide indexing is col major, which is the convention in math for functions (not matrices), so you may want to be saying tmp[xindex, yindex]. We use this convention because it's familiar to people used to graphics shaders, and because Halide Funcs are intended to be thought of as functions over an infinite integer domain, not matrices.

@jansel
Copy link
Contributor Author

jansel commented May 23, 2024

Thanks!

The yindex/xindex naming is artifact of torch.compile's Triton codegen and how we map to GPU grids there. I'll swap it for Halide output, though that shouldn't matter in this example since both the input and output are 1D.

@abadams
Copy link
Member

abadams commented May 23, 2024

I raised it because it would be a source of inefficiency if tmp is ever compute_at somewhere.

Ok, so this is an instance of #3245. What's happening is that the extent required of the input as a function of the extent of the output is:

((extent - 1) / N + (N - 1) * 64) + 1

When N = 5794 this becomes:

(extent - 1) / 5794 + 370753

Halide then "cleverly" simplifies this to:

(extent + 2148142881) / 5794

which overflows a signed integer. Even for 5793 that's a dumb simplification, because extent being large could easily cause overflow at runtime instead. As I say in #3245, the simplifier should not be allowed to introduce new signed integer overflow, but we haven't been sufficiently strict about that because it's really convenient to pretend that our indices are infinite precision integers.

I'll try to figure out why we need this rule, and if there's an alternative

@jansel
Copy link
Contributor Author

jansel commented May 23, 2024

Makes sense. I am hitting this error somewhat frequently, so a fix would be very helpful! If there is a way to get 64-bit indexing that might also fix it (and help in the case where we have >2gb tensors -- which happens for large embedding tables).

In this case in_ptr0 is column major, while out_ptr0 is row major, so an efficient schedule likely involves some tiling so both the load and the store can get vectorized/coalesced. The original kernel has 7 inputs, which were a mix of layouts and broadcasting in one of the two dimensions.

For our Halide backend, I'm currently mapping input/outputs to 1D arrays since the indexing/view support in PyTorch is more flexible than Halide, which makes it easier to express things as 1D buffers plus indexing formulas. With views in PyTorch (or ops like torch.as_strided) the same buffer can be read multiple times with different strides/dimensions, and inlining view operations in TorchInductor can create more complex indexing formulas that can't be expressed with linear strides.

I'm creating the 2D hl.Func for the body of the kernel in cases where we would generate a tiled kernel in our Triton backend. This happens when we have stride=1 memory accesses in two or more different dimensions, so it will always be the case that at least 1 memory access will be swapped with respect to what Halide wants when we generate 2D hl.Funcs. If everything is contiguous in memory, right now we just generate a 1D hl.Func.

I'd love to chat more about if there are better ways to do this mapping, and I'm happy to collaborate since I'm sure PyTorch will expose some areas for improvement in Halide. I had reached to @jrk about this a while back, since I worked closer with him when we were at MIT.

@abadams abadams added the dev_meeting Topic to be discussed at the next dev meeting label May 24, 2024
abadams added a commit that referenced this issue May 24, 2024
Fixes #8227 but may break other things. Needs thorough testing.

Also, there are more rules like this lurking.
abadams added a commit that referenced this issue Jun 5, 2024
…8234)

Fixes #8227 but may break other things. Needs thorough testing.

Also, there are more rules like this lurking.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dev_meeting Topic to be discussed at the next dev meeting
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants