Skip to content

Miscompile with LoopVectorizer: Incorrect code with disjoint or #81872

Closed
@annamthomas

Description

@annamthomas

This latent bug was exposed through d77067d (improvement of known bits through dominating conditions).

Given this source code snippet:

lArr = new long[N]; // initialized to 0. 
for (iv = 99; iv >= 90; --iv) {
    int tmp9 = (iv % 2); 
    if (tmp9 == 0) {
      int tmp7 = (iv + 1); 
      lArr[tmp7] = 1;
    }
}
print(lArr[99]);

This should mean lArr[99] is 1 after the loop (it is set when iv is 98). With known bits improved by dominating conditions, we know that we can convert: tmp7 = add iv, 1 into tmp7 = or disjoint iv, 1 (since iv is known divisible by 2 at that point).

When we vectorize this IR, we incorrectly vectorize the code:

Before vectorization:

bb15:                                             ; preds = %bb20, %bb8
  %iv = phi i64 [ 99, %bb8 ], [ %iv.next, %bb20 ]
  %and = and i64 %iv, 1
  %icmp17 = icmp eq i64 %and, 0
  br i1 %icmp17, label %bb18, label %bb20, !prof !21

bb18:                                             ; preds = %bb15
  %or = or disjoint i64 %iv, 1
  %getelementptr19 = getelementptr inbounds i64, ptr addrspace(1) %getelementptr, i64 %or
  store i64 1, ptr addrspace(1) %getelementptr19, align 8
  br label %bb20

bb20:                                             ; preds = %bb18, %bb15
  %iv.next = add nsw i64 %iv, -1
  %icmp22 = icmp eq i64 %iv.next, 90
  br i1 %icmp22, label %bb6, label %bb15, !prof !22

After vectorization:

vector.body:                                      ; preds = %vector.body, %vector.ph
  %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
  %vec.ind = phi <4 x i64> [ <i64 99, i64 98, i64 97, i64 96>, %vector.ph ], [ %vec.ind.next, %vector.body ]
  %offset.idx = sub i64 99, %index
  %0 = add i64 %offset.idx, 0
  %broadcast.splatinsert = insertelement <4 x i64> poison, i64 %index, i64 0
  %broadcast.splat = shufflevector <4 x i64> %broadcast.splatinsert, <4 x i64> poison, <4 x i32> zeroinitializer
  %vec.iv = add <4 x i64> %broadcast.splat, <i64 0, i64 1, i64 2, i64 3>
  %1 = icmp ule <4 x i64> %vec.iv, <i64 8, i64 8, i64 8, i64 8>
  %2 = and <4 x i64> %vec.ind, <i64 1, i64 1, i64 1, i64 1>
  %3 = icmp eq <4 x i64> %2, zeroinitializer
  %4 = select <4 x i1> %1, <4 x i1> %3, <4 x i1> zeroinitializer
  %5 = or i64 %0, 1
  %6 = getelementptr i64, ptr addrspace(1) %getelementptr, i64 %5
  %7 = getelementptr i64, ptr addrspace(1) %6, i32 0
  %8 = getelementptr i64, ptr addrspace(1) %7, i32 -3
  %reverse = shufflevector <4 x i1> %4, <4 x i1> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>
  call void @llvm.masked.store.v4i64.p1(<4 x i64> <i64 1, i64 1, i64 1, i64 1>, ptr addrspace(1) %8, i32 8, <4 x i1> %reverse)
  %index.next = add i64 %index, 4
  %vec.ind.next = add <4 x i64> %vec.ind, <i64 -4, i64 -4, i64 -4, i64 -4>
  %9 = icmp eq i64 %index.next, 12
  br i1 %9, label %middle.block, label %vector.body, !prof !3, !llvm.loop !4

Complete snippet transformation here: https://godbolt.org/z/Kvq1zerTs

99 disjoint 1 =  99
The array is in first iteration becomes:
a[96, 97, 98, 99] <— 1,0,1,0

Which makes a[99] as 0.

Before vectorization, we only did the store if iv was divisible by 2.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions