Skip to content

WRONG code: LoopUnroll / SCEVExpander with i128 induction variable. #80289

Closed
@JonPsson1

Description

@JonPsson1

This reduced (csmith) test case seems well-defined, and should print '6':

char C = 0;
__int128 IW = 0;
int *IPtr1, *IPtr2;

struct S2 { int f3; };
volatile struct S2 g_1100;

int main() {
  for (; C <= 5; C += 1)
    for (; IW <= 5; IW += 1) {
      IPtr1 = IPtr2;
      g_1100;
    }
  int crc = IW;
  printf("checksum = %d\n", crc);
}

clang -target s390x-linux-gnu -march=z16 -O3  -mllvm -enable-load-pre=false -o ./a.out -mllvm -unroll-max-count=3; ./a.out
checksum = 7
clang -target s390x-linux-gnu -march=z16 -O3  -mllvm -enable-load-pre=false -o ./a.out -mllvm -unroll-max-count=2; ./a.out
checksum = 6

However, when unrolled 3 times (not 2 or 4), the LoopUnroller creates a prologue loop, which is supposed to run extra iterations, as computed in the preheader (LoopUnrollRuntime.cpp:766):

for.body5.preheader:                              ; preds = %for.cond2thread-pre-split
  %2 = sub i128 6, %.pr121517
  %3 = freeze i128 %2
  %4 = add i128 %3, 18446744073709551615
  %5 = urem i128 %4, 3
  %6 = add i128 %5, 1
  %xtraiter = urem i128 %6, 3
  %lcmp.mod = icmp ne i128 %xtraiter, 0
  br i1 %lcmp.mod, label %for.body5.prol.preheader, label %for.body5.prol.loopexit

The constant used for %4 is actually is supposed to be i128 '-1', so UINT64_MAX (i64 -1) doesn't make sense.

i128 <> i64, after LoopUnroller:


for.body5.preheader:                            for.body5.preheader:                         
  %2 = sub i128 6, %.pr121517                 |   %2 = sub i64 6, %.pr121517
  %3 = freeze i128 %2                         |   %3 = freeze i64 %2
  %4 = add i128 %3, 18446744073709551615      |   %4 = add i64 %3, -1
  %5 = urem i128 %4, 3                        |   %5 = urem i64 %4, 3
  %6 = add i128 %5, 1                         |   %6 = add i64 %5, 1
  %xtraiter = urem i128 %6, 3                 |   %xtraiter = urem i64 %6, 3
  %lcmp.mod = icmp ne i128 %xtraiter, 0       |   %lcmp.mod = icmp ne i64 %xtraiter, 0
  br i1 %lcmp.mod, label %for.body5.prol.preh     br i1 %lcmp.mod, label %for.body5.prol.preh

%4 is later optimized to a sub i128 with a folded constant of 18446744073709551621, which really should be '5'.

@nikic @boxu-zhang @xiangzh1 @preames @uweigand

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions