Optimize vadd_s64(vget_low_s64(x), vget_high_s64(x)) as vaddvq_s64(x)

|  |  |
| --- | --- |
| Bugzilla Link | [25612](https://llvm.org/bz25612) |
| Version | trunk |
| OS | All |
| Attachments | [Test code with sample functions and test they are identical](https://user-images.githubusercontent.com/60944935/143752493-ae67efc1-8e7a-43e0-b358-e38870f8144f.gz) |
| Reporter | LLVM Bugzilla Contributor |

## Extended Description 


On Aarch64, clang/LLVM misses an optimization for vadd_s64(vget_low_s64(x), vget_high_s64(x)).  It can be emitted as addp.2d, in the same way as vaddvq_s64.

The reason someone would write the former rather than the latter is that the former is also valid armv7 Neon intrinsic code, whereas the latter is aarch64-only.  (This arose in actual code, Neon optimizations for the Opus audio codec.)

See attached test code. The two test functions optimize as:

func1:
        ext     v1.16b, v0.16b, v0.16b, #&#8203;8
        add     d0, d0, d1
        fmov    x0, d0
        ret


func2:
        addp    d0, v0.2d
        fmov    x0, d0
        ret

even though they have identical behavior.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize vadd_s64(vget_low_s64(x), vget_high_s64(x)) as vaddvq_s64(x) #25986

Extended Description

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development


Bugzilla Link	25612
Version	trunk
OS	All
Attachments	Test code with sample functions and test they are identical
Reporter	LLVM Bugzilla Contributor

Optimize vadd_s64(vget_low_s64(x), vget_high_s64(x)) as vaddvq_s64(x) #25986

Description

Extended Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions