Description
reproducer:
import helion
import helion.language as hl
import torch
@helion.kernel
def test_tile_id_atomic_add(x: torch.Tensor) -> torch.Tensor:
out = torch.zeros_like(x, dtype=torch.int32)
for tile_m, tile_n in hl.tile(x.size()):
out[tile_m.begin, tile_n.begin] = 1
return out
x = torch.randn(64, 64, device="cuda")
config=helion.Config(block_sizes=[16, 16])
code = test_tile_id_atomic_add.bind((x,)).to_triton_code(config)
print(config)
compiled_kernel = test_tile_id_atomic_add.bind((x,)).compile_config(config)
between to_triton_code
and compiler_config
, ConfigSpec.flatten_loops
is set to empty.
Error message:
import helion
import helion.language as hl
import torch
@helion.kernel
def test_tile_id_atomic_add(x: torch.Tensor) -> torch.Tensor:
out = torch.zeros_like(x, dtype=torch.int32)
for tile_m, tile_n in hl.tile(x.size()):
out[tile_m.begin, tile_n.begin] = x[tile_m.begin, tile_n.begin]
return out
x = torch.randn(64, 64, device="cuda")
config=helion.Config(block_sizes=[16, 16])
code = test_tile_id_atomic_add.bind((x,)).to_triton_code(config)
print(config)
compiled_kernel = test_tile_id_atomic_add.bind((x,)).compile_config(config)
This bug is somehow related to tile.begin
-- if we use out[tile_m, tile_n] = 1
, this bug isn't triggered.