Skip to content

Clang vectorizes simple loop in O3 worse than GCC does in O2. #130318

Open
@denzor200

Description

@denzor200

Look at the sample:

void addition(std::array<int, 10000>& result,
              const std::array<int, 10000>& first,
              const std::array<int, 10000>& second)
{
    for (int i=0;i<10000;++i) {
        result[i] = first[i] + second[i];
    }
}

As we could see, the code can be effectively vectorised. Moreover it works with fixed count of elements(10000) which divides by 16 with no remainder, so thus the compiler don't have to create second loop to handle block of remainded elements with count of elements <16 - only simple vectorized loop would be enough.

The full snippet here: https://godbolt.org/z/fcedzGEYa
As we could see in the snippet, GCC doesn't create second loop to handle remainder, while Clang does.
Any reason why Clang does it? Is it possible to fix?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions