Loading/storing an 8 aligned i128 uses two `movq` instead of a single `movups`

Consider this example:
```llvm
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

define void @foo(ptr noalias %slice.0, ptr noalias %value) unnamed_addr {
start:
  %0 = load i128, ptr %value, align 8
  store i128 %0, ptr %slice.0, align 16
  ret void
}
```
Godbolt: https://llc.godbolt.org/z/fMMPbacGx

Expected asm output:
```asm
foo:                                    # @foo
        movups  xmm0, xmmword ptr [rsi]
        movaps  xmmword ptr [rdi], xmm0
        ret
```

Actual asm output:
```asm
foo:                                    # @foo
        mov     rax, qword ptr [rsi]
        mov     rcx, qword ptr [rsi + 8]
        mov     qword ptr [rdi + 8], rcx
        mov     qword ptr [rdi], rax
        ret
```

Replacing the two occurrences of `i128` with `<2 x i64>`  gives the expected output instead, so the issue appears to be specific to `i128`. Alternatively, changing the `align 8` to `align 16` causes LLVM to correctly output `movaps` instructions.

**Real world background:**

I noticed that filling a slice of `u128` in Rust doesn't optimize as well as filling a slice of `[u64; 2]`.
```rust
//type T = u128;
type T = [u64; 2];
pub fn foo(slice: &mut [T], value: &T) {
    slice.fill(*value)
}
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Loading/storing an 8 aligned i128 uses two `movq` instead of a single `movups` #72640

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Loading/storing an 8 aligned i128 uses two movq instead of a single movups #72640

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Loading/storing an 8 aligned i128 uses two `movq` instead of a single `movups` #72640