Auto-vectorizer generates different code for the main loop and for the trailing section

Repro and code available in godbolt: https://godbolt.org/z/c936s31hn

Function `g()` in the example is expected to be auto-vectorized, and produce the same results as function f`()`, which is not vectorized. This is true for the code generated for the "main" part of the loop: I can see it in the generated assembly and in the test results.

However, the code generated for the **trailing** part of the loop does not match the code generated for the "main" part of the loop: the trailing section fails to fuse a multiplication and an addition.

The example program shows this effect:
 - it compares output of `f()` and `g()` for 1 element
 - it compares the output of `g()` for 5 equal input elements, showing that the 5-th result (coming out of the trailing part of the loop) is different than the rest.

For reference, the godbolt link compares against gcc, which generates the expected result.

```
$ clang --version
Ubuntu clang version 19.1.7 (++20250114103320+cd708029e0b2-1~exp1~20250114103432.75)
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/lib/llvm-19/bin
```

Source of installation: https://apt.llvm.org/jammy/dists/llvm-toolchain-jammy-19/main/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Auto-vectorizer generates different code for the main loop and for the trailing section #136838

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Auto-vectorizer generates different code for the main loop and for the trailing section #136838

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions