Skip to content

EarlyCSE misses redundant load elimination from base pointer, but succeeds when the pointer has a non-zero offset #138678

Open
@GINN-Imp

Description

@GINN-Imp

For the following case, EarlyCSE fails to eliminate a redundant load store i8 %5, ptr @g2, align 1 --> store i8 %2, ptr @g2, align 1.
As a result, subsequent passes were not optimized further (store i8 %5, ptr @g2, align 1 --> store i8 0, ptr @g2, align 1).

In contrast, if the pointer is offset using getelementptr (e.g., gep %ptr, 32), the same redundant load pattern is optimized correctly.

I tried to check the debug info for early-cse to find further causes, but couldn't find the option to output the debug info. Does opt or clang have a suitable option to do this please?

Godbolt: https://godbolt.org/z/5GzT9qT9j
alive2 proof: https://alive2.llvm.org/ce/z/Tb5mbH

the reduced case:

@g1 = external global i32
@g2 = external global i8

define void @src(ptr readonly captures(none) %0) local_unnamed_addr #0 {
  %2 = load i8, ptr %0, align 8
  %3 = zext i8 %2 to i32
  store i32 %3, ptr @g1, align 4
  %cond = icmp eq i8 %2, 0
  br i1 %cond, label %4, label %common.ret

common.ret:                                       ; preds = %4, %1
  ret void

4:                                                ; preds = %1
  %5 = load i8, ptr %0, align 8
  store i8 %5, ptr @g2, align 1      ; can be optimized to store i8 %2, ptr @g2, align 1
  br label %common.ret
}

opt -O3 didn't do anything to optimize it.

the case that can be optimized:

define void @src2(ptr readonly captures(none) %0) local_unnamed_addr #0 {
  %2 = getelementptr inbounds nuw i8, ptr %0, i64 32
  %3 = load i8, ptr %2, align 8
  %4 = zext i8 %3 to i32
  store i32 %4, ptr @g1, align 4
  %cond = icmp eq i8 %3, 0
  br i1 %cond, label %5, label %common.ret

common.ret:                                       ; preds = %5, %1
  ret void

5:                                                ; preds = %1
  %6 = load i8, ptr %2, align 8
  store i8 %6, ptr @g2, align 1
  br label %common.ret
}

early-cse on src2:

 - %6 = load i8, ptr %2, align 8
 - store i8 %6, ptr @g2, align 1
+ store i8 %3, ptr @g2, align 1

The reduced case is derived from https://github.com/c3lang/c3c/blob/125436d23ef9b7f69837a00ffec168c52839a1dc/src/compiler/llvm_codegen_expr.c#L2376.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions