Closed
Description
This latent bug was exposed through d77067d (improvement of known bits through dominating conditions).
Given this source code snippet:
lArr = new long[N]; // initialized to 0.
for (iv = 99; iv >= 90; --iv) {
int tmp9 = (iv % 2);
if (tmp9 == 0) {
int tmp7 = (iv + 1);
lArr[tmp7] = 1;
}
}
print(lArr[99]);
This should mean lArr[99] is 1 after the loop (it is set when iv is 98). With known bits improved by dominating conditions, we know that we can convert: tmp7 = add iv, 1
into tmp7 = or disjoint iv, 1
(since iv is known divisible by 2 at that point).
When we vectorize this IR, we incorrectly vectorize the code:
Before vectorization:
bb15: ; preds = %bb20, %bb8
%iv = phi i64 [ 99, %bb8 ], [ %iv.next, %bb20 ]
%and = and i64 %iv, 1
%icmp17 = icmp eq i64 %and, 0
br i1 %icmp17, label %bb18, label %bb20, !prof !21
bb18: ; preds = %bb15
%or = or disjoint i64 %iv, 1
%getelementptr19 = getelementptr inbounds i64, ptr addrspace(1) %getelementptr, i64 %or
store i64 1, ptr addrspace(1) %getelementptr19, align 8
br label %bb20
bb20: ; preds = %bb18, %bb15
%iv.next = add nsw i64 %iv, -1
%icmp22 = icmp eq i64 %iv.next, 90
br i1 %icmp22, label %bb6, label %bb15, !prof !22
After vectorization:
vector.body: ; preds = %vector.body, %vector.ph
%index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
%vec.ind = phi <4 x i64> [ <i64 99, i64 98, i64 97, i64 96>, %vector.ph ], [ %vec.ind.next, %vector.body ]
%offset.idx = sub i64 99, %index
%0 = add i64 %offset.idx, 0
%broadcast.splatinsert = insertelement <4 x i64> poison, i64 %index, i64 0
%broadcast.splat = shufflevector <4 x i64> %broadcast.splatinsert, <4 x i64> poison, <4 x i32> zeroinitializer
%vec.iv = add <4 x i64> %broadcast.splat, <i64 0, i64 1, i64 2, i64 3>
%1 = icmp ule <4 x i64> %vec.iv, <i64 8, i64 8, i64 8, i64 8>
%2 = and <4 x i64> %vec.ind, <i64 1, i64 1, i64 1, i64 1>
%3 = icmp eq <4 x i64> %2, zeroinitializer
%4 = select <4 x i1> %1, <4 x i1> %3, <4 x i1> zeroinitializer
%5 = or i64 %0, 1
%6 = getelementptr i64, ptr addrspace(1) %getelementptr, i64 %5
%7 = getelementptr i64, ptr addrspace(1) %6, i32 0
%8 = getelementptr i64, ptr addrspace(1) %7, i32 -3
%reverse = shufflevector <4 x i1> %4, <4 x i1> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>
call void @llvm.masked.store.v4i64.p1(<4 x i64> <i64 1, i64 1, i64 1, i64 1>, ptr addrspace(1) %8, i32 8, <4 x i1> %reverse)
%index.next = add i64 %index, 4
%vec.ind.next = add <4 x i64> %vec.ind, <i64 -4, i64 -4, i64 -4, i64 -4>
%9 = icmp eq i64 %index.next, 12
br i1 %9, label %middle.block, label %vector.body, !prof !3, !llvm.loop !4
Complete snippet transformation here: https://godbolt.org/z/Kvq1zerTs
99 disjoint 1 = 99
The array is in first iteration becomes:
a[96, 97, 98, 99] <— 1,0,1,0
Which makes a[99] as 0.
Before vectorization, we only did the store if iv
was divisible by 2.