Closed
Description
This reduced (csmith) test case seems well-defined, and should print '6':
char C = 0;
__int128 IW = 0;
int *IPtr1, *IPtr2;
struct S2 { int f3; };
volatile struct S2 g_1100;
int main() {
for (; C <= 5; C += 1)
for (; IW <= 5; IW += 1) {
IPtr1 = IPtr2;
g_1100;
}
int crc = IW;
printf("checksum = %d\n", crc);
}
clang -target s390x-linux-gnu -march=z16 -O3 -mllvm -enable-load-pre=false -o ./a.out -mllvm -unroll-max-count=3; ./a.out
checksum = 7
clang -target s390x-linux-gnu -march=z16 -O3 -mllvm -enable-load-pre=false -o ./a.out -mllvm -unroll-max-count=2; ./a.out
checksum = 6
However, when unrolled 3 times (not 2 or 4), the LoopUnroller creates a prologue loop, which is supposed to run extra iterations, as computed in the preheader (LoopUnrollRuntime.cpp:766):
for.body5.preheader: ; preds = %for.cond2thread-pre-split
%2 = sub i128 6, %.pr121517
%3 = freeze i128 %2
%4 = add i128 %3, 18446744073709551615
%5 = urem i128 %4, 3
%6 = add i128 %5, 1
%xtraiter = urem i128 %6, 3
%lcmp.mod = icmp ne i128 %xtraiter, 0
br i1 %lcmp.mod, label %for.body5.prol.preheader, label %for.body5.prol.loopexit
The constant used for %4 is actually is supposed to be i128 '-1', so UINT64_MAX (i64 -1) doesn't make sense.
i128 <> i64, after LoopUnroller:
for.body5.preheader: for.body5.preheader:
%2 = sub i128 6, %.pr121517 | %2 = sub i64 6, %.pr121517
%3 = freeze i128 %2 | %3 = freeze i64 %2
%4 = add i128 %3, 18446744073709551615 | %4 = add i64 %3, -1
%5 = urem i128 %4, 3 | %5 = urem i64 %4, 3
%6 = add i128 %5, 1 | %6 = add i64 %5, 1
%xtraiter = urem i128 %6, 3 | %xtraiter = urem i64 %6, 3
%lcmp.mod = icmp ne i128 %xtraiter, 0 | %lcmp.mod = icmp ne i64 %xtraiter, 0
br i1 %lcmp.mod, label %for.body5.prol.preh br i1 %lcmp.mod, label %for.body5.prol.preh
%4 is later optimized to a sub i128 with a folded constant of 18446744073709551621, which really should be '5'.