Open
Description
Bugzilla Link | 25612 |
Version | trunk |
OS | All |
Attachments | Test code with sample functions and test they are identical |
Reporter | LLVM Bugzilla Contributor |
Extended Description
On Aarch64, clang/LLVM misses an optimization for vadd_s64(vget_low_s64(x), vget_high_s64(x)). It can be emitted as addp.2d, in the same way as vaddvq_s64.
The reason someone would write the former rather than the latter is that the former is also valid armv7 Neon intrinsic code, whereas the latter is aarch64-only. (This arose in actual code, Neon optimizations for the Opus audio codec.)
See attached test code. The two test functions optimize as:
func1:
ext v1.16b, v0.16b, v0.16b, #8
add d0, d0, d1
fmov x0, d0
ret
func2:
addp d0, v0.2d
fmov x0, d0
ret
even though they have identical behavior.