Codegen emits ineffective vxorps/vaddps sequences

I tried this code:

```rust
pub fn vae_dec_linear(latent: &[f32; 4]) -> [f32; 3] {
    let weights: [[f32; 4]; 3] = [
        [49.5210, 29.0283, -23.9673, -39.4981],
        [41.1373, 42.4951, 24.7349, -50.8279],
        [40.2919, 18.9304, 30.0236, -81.9976],
    ];
  //  let bias = [99.9368, 99.8421, 99.5384];
    let bias = [0.0, 0.0, 0.0];
    affine_transform(&weights, &bias, &latent)
}

pub fn affine_transform<
    T: Copy + std::ops::Add<Output = T> + std::ops::Mul<Output = T>,
    const IN: usize,
    const OUT: usize,
>(
    weights: &[[T; IN]; OUT],
    bias: &[T; OUT],
    vec: &[T; IN],
) -> [T; OUT] {
    std::array::from_fn(|i| (0..OUT).fold(bias[i], |acc, j| acc + vec[i] * weights[i][j]))
}
```

I expected to see this happen: Literally nothing :)

Instead, this happened: 
```asm
example::vae_dec_linear:
        mov     rax, rdi
        vmovss  xmm0, dword ptr [rsi + 8]
        vmulss  xmm1, xmm0, dword ptr [rip + .LCPI0_0]

        vxorps  xmm2, xmm2, xmm2                 ;; here
        vaddss  xmm1, xmm1, xmm2

        vmulss  xmm2, xmm0, dword ptr [rip + .LCPI0_1]
        vmulss  xmm0, xmm0, dword ptr [rip + .LCPI0_2]
        vaddss  xmm1, xmm2, xmm1
        vaddss  xmm0, xmm0, xmm1
        vmovsd  xmm1, qword ptr [rsi]
        vmulps  xmm2, xmm1, xmmword ptr [rip + .LCPI0_3]

        vxorps  xmm3, xmm3, xmm3                 ;; and here
        vaddps  xmm2, xmm2, xmm3

        vmulps  xmm3, xmm1, xmmword ptr [rip + .LCPI0_4]
        vmulps  xmm1, xmm1, xmmword ptr [rip + .LCPI0_5]
        vaddps  xmm2, xmm3, xmm2
        vaddsubps       xmm1, xmm2, xmm1
        vmovlps qword ptr [rdi], xmm1
        vmovss  dword ptr [rdi + 8], xmm0
        ret
```

Note the two `xor`/`add` sequences adding zero to `xmm1` or `xmm2`, respectively, this is with `-Copt-level=3 -C target-cpu=native`

EDIT: Oh and nevermind that the code has a bug it should be 

```rust
    std::array::from_fn(|i| (0..IN).fold(bias[i], |acc, j| acc + vec[j] * weights[i][j]))
```
Changes the second sequence to

```asm
        vxorps  xmm5, xmm5, xmm5
        vmulps  xmm2, xmm2, xmmword ptr [rip + .LCPI0_5]
        vaddps  xmm0, xmm0, xmm5
        vaddps  xmm0, xmm0, xmm2
```

, not as easy to spot, the second vaddps also looks suspicious. 

### Meta


`rustc --version --verbose`:

All I looked at in compiler explorer, including nightly



<details><summary>Backtrace</summary>
<p>

```
<backtrace>
```

</p>
</details>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Codegen emits ineffective vxorps/vaddps sequences #114033

Meta

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Codegen emits ineffective vxorps/vaddps sequences #114033

Description

Meta

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions