Open
Description
openedon Oct 29, 2023
const N: usize = 4096;
#[inline(never)]
pub fn example_fn(xs: &[u64; N], ys: &[u64; N], res: &mut [u64; N]) {
for i in 0..N {
res[i] = xs[i] + ys[i];
}
}
I expected to see VADD
in resulting assembly but unfortunately instead it uses just add
s:
add z0.d, z4.d, z0.d
add z1.d, z5.d, z1.d
add z2.d, z6.d, z2.d
add z3.d, z7.d, z3.d
Link to godbolt: https://godbolt.org/z/YWhT5GqxW
Just for reference I would expect it to mirror x86 SSE: https://godbolt.org/z/z5Pf3G17e
I think rustc should be able to perform autovectorization without additional hints/features enabled and use VADD
in the loop.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Metadata
Assignees
Labels
Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.Area: Autovectorization, which can impact perf or code sizeCategory: An issue highlighting optimization opportunities or PRs implementing suchIssue: Problems and improvements with respect to performance of generated code.Armv8-A or later processors in AArch64 modeRelevant to the compiler team, which will review and decide on the PR/issue.