Open
Description
Repro and code available in godbolt: https://godbolt.org/z/c936s31hn
Function g()
in the example is expected to be auto-vectorized, and produce the same results as function f()
, which is not vectorized. This is true for the code generated for the "main" part of the loop: I can see it in the generated assembly and in the test results.
However, the code generated for the trailing part of the loop does not match the code generated for the "main" part of the loop: the trailing section fails to fuse a multiplication and an addition.
The example program shows this effect:
- it compares output of
f()
andg()
for 1 element - it compares the output of
g()
for 5 equal input elements, showing that the 5-th result (coming out of the trailing part of the loop) is different than the rest.
For reference, the godbolt link compares against gcc, which generates the expected result.
$ clang --version
Ubuntu clang version 19.1.7 (++20250114103320+cd708029e0b2-1~exp1~20250114103432.75)
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/lib/llvm-19/bin
Source of installation: https://apt.llvm.org/jammy/dists/llvm-toolchain-jammy-19/main/