Skip to content

Loading/storing an 8 aligned i128 uses two movq instead of a single movups #72640

Open
@LingMan

Description

@LingMan

Consider this example:

target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

define void @foo(ptr noalias %slice.0, ptr noalias %value) unnamed_addr {
start:
  %0 = load i128, ptr %value, align 8
  store i128 %0, ptr %slice.0, align 16
  ret void
}

Godbolt: https://llc.godbolt.org/z/fMMPbacGx

Expected asm output:

foo:                                    # @foo
        movups  xmm0, xmmword ptr [rsi]
        movaps  xmmword ptr [rdi], xmm0
        ret

Actual asm output:

foo:                                    # @foo
        mov     rax, qword ptr [rsi]
        mov     rcx, qword ptr [rsi + 8]
        mov     qword ptr [rdi + 8], rcx
        mov     qword ptr [rdi], rax
        ret

Replacing the two occurrences of i128 with <2 x i64> gives the expected output instead, so the issue appears to be specific to i128. Alternatively, changing the align 8 to align 16 causes LLVM to correctly output movaps instructions.

Real world background:

I noticed that filling a slice of u128 in Rust doesn't optimize as well as filling a slice of [u64; 2].

//type T = u128;
type T = [u64; 2];
pub fn foo(slice: &mut [T], value: &T) {
    slice.fill(*value)
}

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions