Skip to content

LLVM vector intrinsic support for SVE? #40308

Open

Description

It'd be great to get good support for SVE, especially as SVE2 will become standard for ARMv9.

However, early tests with using LLVM vector intrinsics on the A64FX did not go well.
Here is a minimal example on Godbolt, showing a vectorized (but not unrolled) dot product on the A64FX, which has 512 bit vectors.
The problem is that <8 x double> gets translated into 4x <2 x double> NEON instructions, instead of an SVE instruction.
v registers are NEON, and see see that the single @llvm.fma.v8f64 was broken up into 4 separate fmla instructions.
Based on this document, SVE registers would be denoted by z[0-31].

This makes me wonder if to actually get intrinsic support for SVE, if we'd need to use <vscale x 2 x double>, etc, instead?
This isn't compelling in Julia (unlike C/C++/wherever folks distribute binaries), since we're probably compiling for the specific target machine anyway, and can easily find the appropriate vector length using @llvm.vscale.i64.

Furthermore, we don't have any way to represent that at the moment. NTuple{L,Core.VecElement{T}} <-> <L x T>, but there's no vscale version at the moment.

Anyone have any insight into/knowledge about this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    compiler:codegenGeneration of LLVM IR and native codecompiler:simdinstruction-level vectorizationsystem:armARMv7 and AArch64

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions