Skip to content

Auto-vectorizer generates different code for the main loop and for the trailing section #136838

Open
@dmenendez-gruposantander

Description

Repro and code available in godbolt: https://godbolt.org/z/c936s31hn

Function g() in the example is expected to be auto-vectorized, and produce the same results as function f(), which is not vectorized. This is true for the code generated for the "main" part of the loop: I can see it in the generated assembly and in the test results.

However, the code generated for the trailing part of the loop does not match the code generated for the "main" part of the loop: the trailing section fails to fuse a multiplication and an addition.

The example program shows this effect:

  • it compares output of f() and g() for 1 element
  • it compares the output of g() for 5 equal input elements, showing that the 5-th result (coming out of the trailing part of the loop) is different than the rest.

For reference, the godbolt link compares against gcc, which generates the expected result.

$ clang --version
Ubuntu clang version 19.1.7 (++20250114103320+cd708029e0b2-1~exp1~20250114103432.75)
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/lib/llvm-19/bin

Source of installation: https://apt.llvm.org/jammy/dists/llvm-toolchain-jammy-19/main/

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions