Description
For the following case, EarlyCSE fails to eliminate a redundant load store i8 %5, ptr @g2, align 1 --> store i8 %2, ptr @g2, align 1
.
As a result, subsequent passes were not optimized further (store i8 %5, ptr @g2, align 1 --> store i8 0, ptr @g2, align 1
).
In contrast, if the pointer is offset using getelementptr (e.g., gep %ptr, 32), the same redundant load pattern is optimized correctly.
I tried to check the debug info for early-cse to find further causes, but couldn't find the option to output the debug info. Does opt or clang have a suitable option to do this please?
Godbolt: https://godbolt.org/z/5GzT9qT9j
alive2 proof: https://alive2.llvm.org/ce/z/Tb5mbH
the reduced case:
@g1 = external global i32
@g2 = external global i8
define void @src(ptr readonly captures(none) %0) local_unnamed_addr #0 {
%2 = load i8, ptr %0, align 8
%3 = zext i8 %2 to i32
store i32 %3, ptr @g1, align 4
%cond = icmp eq i8 %2, 0
br i1 %cond, label %4, label %common.ret
common.ret: ; preds = %4, %1
ret void
4: ; preds = %1
%5 = load i8, ptr %0, align 8
store i8 %5, ptr @g2, align 1 ; can be optimized to store i8 %2, ptr @g2, align 1
br label %common.ret
}
opt -O3 didn't do anything to optimize it.
the case that can be optimized:
define void @src2(ptr readonly captures(none) %0) local_unnamed_addr #0 {
%2 = getelementptr inbounds nuw i8, ptr %0, i64 32
%3 = load i8, ptr %2, align 8
%4 = zext i8 %3 to i32
store i32 %4, ptr @g1, align 4
%cond = icmp eq i8 %3, 0
br i1 %cond, label %5, label %common.ret
common.ret: ; preds = %5, %1
ret void
5: ; preds = %1
%6 = load i8, ptr %2, align 8
store i8 %6, ptr @g2, align 1
br label %common.ret
}
early-cse on src2:
- %6 = load i8, ptr %2, align 8
- store i8 %6, ptr @g2, align 1
+ store i8 %3, ptr @g2, align 1
The reduced case is derived from https://github.com/c3lang/c3c/blob/125436d23ef9b7f69837a00ffec168c52839a1dc/src/compiler/llvm_codegen_expr.c#L2376.