Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Copying a static value in an array with -Oz far from optimal size #108841

Open
AreaZR opened this issue Sep 16, 2024 · 3 comments
Open

Copying a static value in an array with -Oz far from optimal size #108841

AreaZR opened this issue Sep 16, 2024 · 3 comments

Comments

@AreaZR
Copy link
Contributor

AreaZR commented Sep 16, 2024

https://godbolt.org/z/o6cf889oG

Even ignoring the fact GCC's code is smaller, there is an instruction that is not being used that can make this happen faster:
repstosd

@rilysh
Copy link
Contributor

rilysh commented Sep 29, 2024

-Oz also enables -O2, and stosd is 3 uops, where inc or add to REX is 1 uops. Both GCC and Clang generates very similar output.

GCC:

main:
        xor     eax, eax
.L2:
        mov     DWORD PTR h[0+rax*4], 23321332
        inc     rax
        cmp     rax, 32
        jne     .L2
        xor     eax, eax
        ret
h:
        .zero   128

And Clang:

main:
        xor     eax, eax
        lea     rcx, [rip + h]
.LBB0_1:
        cmp     rax, 128
        je      .LBB0_2
        mov     dword ptr [rax + rcx], 23321332
        add     rax, 4
        jmp     .LBB0_1
.LBB0_2:
        xor     eax, eax
        ret

h:
        .zero   128

GCC's generated assembly is increasing rax, with each iteration (total 32 iterations (if increasing by 1) or 8 (if increasing by 4)), whereas Clang is adding 4, with each iteration (total 8 iterations). Both seems nearly identical in terms of size, so I don't see any problem here.

@AreaZR
Copy link
Contributor Author

AreaZR commented Sep 30, 2024

The point is that you can use repstosd this is something GCC ALSO gets wrong

@rilysh
Copy link
Contributor

rilysh commented Sep 30, 2024

The point is that you can use repstosd this is something GCC ALSO gets wrong

Neither one is wrong, I think you didn't read my previous message. -Oz also enables a few optimizations from -O2 and stosd is 3 uops.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants