Skip to content

Rustc fails to perform autovectorization for a loop when targeting aarch64-apple-darwin #117345

Open

Description

const N: usize = 4096;

#[inline(never)]
pub fn example_fn(xs: &[u64; N], ys: &[u64; N], res: &mut [u64; N]) {
    for i in 0..N {
        res[i] = xs[i] + ys[i];
    }
}

I expected to see VADD in resulting assembly but unfortunately instead it uses just adds:

        add     z0.d, z4.d, z0.d
        add     z1.d, z5.d, z1.d
        add     z2.d, z6.d, z2.d
        add     z3.d, z7.d, z3.d

Link to godbolt: https://godbolt.org/z/YWhT5GqxW

Just for reference I would expect it to mirror x86 SSE: https://godbolt.org/z/z5Pf3G17e

I think rustc should be able to perform autovectorization without additional hints/features enabled and use VADD in the loop.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    A-LLVMArea: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.A-autovectorizationArea: Autovectorization, which can impact perf or code sizeC-optimizationCategory: An issue highlighting optimization opportunities or PRs implementing suchI-slowIssue: Problems and improvements with respect to performance of generated code.O-AArch64Armv8-A or later processors in AArch64 modeT-compilerRelevant to the compiler team, which will review and decide on the PR/issue.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions