Skip to content

Codegen emits ineffective vxorps/vaddps sequences #114033

Open
@ksf

Description

@ksf

I tried this code:

pub fn vae_dec_linear(latent: &[f32; 4]) -> [f32; 3] {
    let weights: [[f32; 4]; 3] = [
        [49.5210, 29.0283, -23.9673, -39.4981],
        [41.1373, 42.4951, 24.7349, -50.8279],
        [40.2919, 18.9304, 30.0236, -81.9976],
    ];
  //  let bias = [99.9368, 99.8421, 99.5384];
    let bias = [0.0, 0.0, 0.0];
    affine_transform(&weights, &bias, &latent)
}

pub fn affine_transform<
    T: Copy + std::ops::Add<Output = T> + std::ops::Mul<Output = T>,
    const IN: usize,
    const OUT: usize,
>(
    weights: &[[T; IN]; OUT],
    bias: &[T; OUT],
    vec: &[T; IN],
) -> [T; OUT] {
    std::array::from_fn(|i| (0..OUT).fold(bias[i], |acc, j| acc + vec[i] * weights[i][j]))
}

I expected to see this happen: Literally nothing :)

Instead, this happened:

example::vae_dec_linear:
        mov     rax, rdi
        vmovss  xmm0, dword ptr [rsi + 8]
        vmulss  xmm1, xmm0, dword ptr [rip + .LCPI0_0]

        vxorps  xmm2, xmm2, xmm2                 ;; here
        vaddss  xmm1, xmm1, xmm2

        vmulss  xmm2, xmm0, dword ptr [rip + .LCPI0_1]
        vmulss  xmm0, xmm0, dword ptr [rip + .LCPI0_2]
        vaddss  xmm1, xmm2, xmm1
        vaddss  xmm0, xmm0, xmm1
        vmovsd  xmm1, qword ptr [rsi]
        vmulps  xmm2, xmm1, xmmword ptr [rip + .LCPI0_3]

        vxorps  xmm3, xmm3, xmm3                 ;; and here
        vaddps  xmm2, xmm2, xmm3

        vmulps  xmm3, xmm1, xmmword ptr [rip + .LCPI0_4]
        vmulps  xmm1, xmm1, xmmword ptr [rip + .LCPI0_5]
        vaddps  xmm2, xmm3, xmm2
        vaddsubps       xmm1, xmm2, xmm1
        vmovlps qword ptr [rdi], xmm1
        vmovss  dword ptr [rdi + 8], xmm0
        ret

Note the two xor/add sequences adding zero to xmm1 or xmm2, respectively, this is with -Copt-level=3 -C target-cpu=native

EDIT: Oh and nevermind that the code has a bug it should be

    std::array::from_fn(|i| (0..IN).fold(bias[i], |acc, j| acc + vec[j] * weights[i][j]))

Changes the second sequence to

        vxorps  xmm5, xmm5, xmm5
        vmulps  xmm2, xmm2, xmmword ptr [rip + .LCPI0_5]
        vaddps  xmm0, xmm0, xmm5
        vaddps  xmm0, xmm0, xmm2

, not as easy to spot, the second vaddps also looks suspicious.

Meta

rustc --version --verbose:

All I looked at in compiler explorer, including nightly

Backtrace

<backtrace>

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-LLVMArea: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.A-codegenArea: Code generationC-bugCategory: This is a bug.I-heavyIssue: Problems and improvements with respect to binary size of generated code.I-slowIssue: Problems and improvements with respect to performance of generated code.O-x86_32Target: x86 processors, 32 bit (like i686-*) (IA-32)O-x86_64Target: x86-64 processors (like x86_64-*) (also known as amd64 and x64)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions