Closed
Description
IIUC, affine-loop-fusion pass doesn't take into account memory accesses in regions inside of non affine-for ops. In some cases it leads to removing operations that are necessary for another piece of code.
Example (reduced to minimal reproducer from real code):
func.func @func() {
%cst = arith.constant 1 : i32
%alloc = memref.alloc() : memref<16xi32>
affine.for %arg0 = 0 to 16 {
affine.store %cst, %alloc[%arg0] : memref<16xi32>
}
affine.for %arg0 = 0 to 16 {
%1 = affine.load %alloc[%arg0] : memref<16xi32>
}
%0 = arith.cmpi eq, %cst, %cst : i32
scf.if %0 {
affine.for %arg0 = 0 to 16 {
%1 = affine.load %alloc[%arg0] : memref<16xi32>
}
}
return
}
command: mlir-opt -pass-pipeline='builtin.module(func.func(affine-loop-fusion))' example.mlir
Output:
module {
func.func @func() {
%alloc = memref.alloc() : memref<1xi32>
%c1_i32 = arith.constant 1 : i32
%alloc_0 = memref.alloc() : memref<16xi32>
affine.for %arg0 = 0 to 16 {
affine.store %c1_i32, %alloc[0] : memref<1xi32>
%1 = affine.load %alloc[0] : memref<1xi32>
}
%0 = arith.cmpi eq, %c1_i32, %c1_i32 : i32
scf.if %0 {
affine.for %arg0 = 0 to 16 {
%1 = affine.load %alloc_0[%arg0] : memref<16xi32>
}
}
return
}
}
So, alloc_0
becomes uninitialized, but still used by the code after fused loop.
Even if I use affine.if
operation (like on example below) the behavior is the same:
#set0 = affine_set<(d0) : (1 == 0)>
func.func @func() {
%cst = arith.constant 1 : i32
%alloc = memref.alloc() : memref<16xi32>
affine.for %arg0 = 0 to 16 {
affine.store %cst, %alloc[%arg0] : memref<16xi32>
}
affine.for %arg0 = 0 to 16 {
%1 = affine.load %alloc[%arg0] : memref<16xi32>
}
%0 = arith.index_cast %cst : i32 to index
affine.if #set0(%0) {
affine.for %arg0 = 0 to 16 {
%1 = affine.load %alloc[%arg0] : memref<16xi32>
}
}
return
}
hash of last llvm-project's commit: c230138