Regression from clang 16: missed vectorization in simple array initialization #111126

ZY546 · 2024-10-04T09:51:02Z

reduced code:
https://godbolt.org/z/3MYzE1v7M

long arr [20];

void f() {
    for (int i = 0; i < 20; i += 1) 
    {
        arr [i] = 1;
    }
}

clang -O3:

f():
        mov     qword ptr [rip + arr], 1
        mov     qword ptr [rip + arr+8], 1
        mov     qword ptr [rip + arr+16], 1
        mov     qword ptr [rip + arr+24], 1
        mov     qword ptr [rip + arr+32], 1
        mov     qword ptr [rip + arr+40], 1
        mov     qword ptr [rip + arr+48], 1
        mov     qword ptr [rip + arr+56], 1
        mov     qword ptr [rip + arr+64], 1
        mov     qword ptr [rip + arr+72], 1
        mov     qword ptr [rip + arr+80], 1
        mov     qword ptr [rip + arr+88], 1
        mov     qword ptr [rip + arr+96], 1
        mov     qword ptr [rip + arr+104], 1
        mov     qword ptr [rip + arr+112], 1
        mov     qword ptr [rip + arr+120], 1
        mov     qword ptr [rip + arr+128], 1
        mov     qword ptr [rip + arr+136], 1
        mov     qword ptr [rip + arr+144], 1
        mov     qword ptr [rip + arr+152], 1
        ret

expected code (clang 15 or gcc):

f():
        movaps  xmm0, xmmword ptr [rip + .LCPI0_0]
        movaps  xmmword ptr [rip + arr], xmm0
        movaps  xmmword ptr [rip + arr+16], xmm0
        movaps  xmmword ptr [rip + arr+32], xmm0
        movaps  xmmword ptr [rip + arr+48], xmm0
        movaps  xmmword ptr [rip + arr+64], xmm0
        movaps  xmmword ptr [rip + arr+80], xmm0
        movaps  xmmword ptr [rip + arr+96], xmm0
        movaps  xmmword ptr [rip + arr+112], xmm0
        movaps  xmmword ptr [rip + arr+128], xmm0
        movaps  xmmword ptr [rip + arr+144], xmm0
        ret

topperc · 2024-10-04T17:50:02Z

Looks like it was fully unrolled before the loop vectorizer saw it. Need to check why SLP didn't catch it.

topperc · 2024-10-04T17:54:30Z

SLP does vectorize using 256-bit vectors with -mavx. So SLP is capable of vectorizing it. Maybe a cost model issue?

CC: @RKSimon @alexey-bataev

alexey-bataev · 2024-10-04T18:01:11Z

SLP does vectorize using 256-bit vectors with -mavx. So SLP is capable of vectorizing it. Maybe a cost model issue?

CC: @RKSimon @alexey-bataev

Cost model decides it is not profitable, vectorized with -slp-threshold=-1

github-actions bot added the clang Clang issues not falling into any other category label Oct 4, 2024

antoniofrighetto self-assigned this Oct 4, 2024

EugeneZelenko added vectorizers missed-optimization regression and removed clang Clang issues not falling into any other category missed-optimization labels Oct 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Regression from clang 16: missed vectorization in simple array initialization #111126

Regression from clang 16: missed vectorization in simple array initialization #111126

ZY546 commented Oct 4, 2024

topperc commented Oct 4, 2024

topperc commented Oct 4, 2024

alexey-bataev commented Oct 4, 2024

Regression from clang 16: missed vectorization in simple array initialization #111126

Regression from clang 16: missed vectorization in simple array initialization #111126

Comments

ZY546 commented Oct 4, 2024

topperc commented Oct 4, 2024

topperc commented Oct 4, 2024

alexey-bataev commented Oct 4, 2024