Skip to content

Optimize vadd_s64(vget_low_s64(x), vget_high_s64(x)) as vaddvq_s64(x) #25986

Open
@llvmbot

Description

@llvmbot
Bugzilla Link 25612
Version trunk
OS All
Attachments Test code with sample functions and test they are identical
Reporter LLVM Bugzilla Contributor

Extended Description

On Aarch64, clang/LLVM misses an optimization for vadd_s64(vget_low_s64(x), vget_high_s64(x)). It can be emitted as addp.2d, in the same way as vaddvq_s64.

The reason someone would write the former rather than the latter is that the former is also valid armv7 Neon intrinsic code, whereas the latter is aarch64-only. (This arose in actual code, Neon optimizations for the Opus audio codec.)

See attached test code. The two test functions optimize as:

func1:
ext v1.16b, v0.16b, v0.16b, #​8
add d0, d0, d1
fmov x0, d0
ret

func2:
addp d0, v0.2d
fmov x0, d0
ret

even though they have identical behavior.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions