Skip to content

Sinking load instructions results in worse performance and increased dynamic instruction counts #96838

Closed
@kazutakahirata

Description

@kazutakahirata

SimplifyCFG recently gained ede27d8 by @nikic. Now, it seems to backfire in some cases:

Compile attached bcmp.ll (generated from llvm-project/libc/src/string/bcmp.cpp) like so:

$ clang -O3 -S bcmp.ll -o bcmp.s

Then I get:

Without the patch:

# %bb.26:
        movdqu  -16(%rdi,%rdx), %xmm0
        movdqu  -16(%rsi,%rdx), %xmm1
        jmp     .LBB1_34
:
:
:
# %bb.33:
        movdqu  (%rdi,%rdx), %xmm0
        movdqu  (%rsi,%rdx), %xmm1
.LBB1_34:                               # %.loopexit
With the patch:

# %bb.25:
        addq    %rdx, %rdi
        addq    $-16, %rdi
        addq    %rdx, %rsi
        addq    $-16, %rsi
        jmp     .LBB1_33
:
:
:
# %bb.32:
        addq    %rdx, %rdi
        addq    %rdx, %rsi
.LBB1_33:                               # %.loopexit.sink.split
        movdqu  (%rdi), %xmm0
        movdqu  (%rsi), %xmm1

Notice that the two load instructions sink just below the join point while the address calculation is left behind. This seems to result in worse performance and increased dynamic instruction counts.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions