Skip to content

Inefficient scalable vector codegen after "[SCEV] Add SCEVType to represent vscale." #61742

Closed
@UsmanNadeem

Description

@UsmanNadeem

See sample code: https://godbolt.org/z/dnPzGrTqj

The issue was bisected to patch [SCEV] Add SCEVType to represent ``vscale``. :
https://reviews.llvm.org/D144891 or 62d11b2

A quick look shows that the difference comes from LSR and disabling LSR gives good code again.

Before the patch we were getting

        ld1d    { z1.d }, p0/z, [x1, x8, lsl #3]
        add     x11, x1, x10
        ld1d    { z2.d }, p0/z, [x11, #1, mul vl]
        ld1d    { z3.d }, p0/z, [x11, #2, mul vl]
        ld1d    { z4.d }, p0/z, [x11, #3, mul vl]
        ld1d    { z5.d }, p0/z, [x11, #4, mul vl]
        ...

Now we are seeing address calculation inside the loop:

        ld1d    { z1.d }, p0/z, [x1, x8, lsl #3]
        add     x11, x1, x10
        ld1b    { z2.b }, p1/z, [x11, x9]
        add     x12, x11, x9
        ld1b    { z3.b }, p1/z, [x12, x9]
        add     x13, x12, x9
        ld1b    { z4.b }, p1/z, [x13, x9]
        add     x11, x13, x9
        ld1b    { z5.b }, p1/z, [x11, x9]
        ...

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions