Description
Given a basic struct like this,
#[derive(Copy, Clone, PartialEq, Eq)]
pub struct Entity {
g: u32,
i: u32
}
The generated ==
is suboptimal:
#[no_mangle]
pub fn derived_eq(x: &Entity, y: &Entity) -> bool {
x == y
}
derived_eq:
movq xmm0, qword ptr [rdi]
movq xmm1, qword ptr [rsi]
pcmpeqd xmm1, xmm0
pshufd xmm0, xmm1, 80
movmskpd eax, xmm0
cmp eax, 3
sete al
ret
https://rust.godbolt.org/z/1b1xsnzx6
For comparison, not using short-circuiting
#[no_mangle]
pub fn good_eq(x: &Entity, y: &Entity) -> bool {
(x.g == y.g) & (x.i == y.i)
}
gives a much-simpler codegen
good_eq:
mov rax, qword ptr [rsi]
cmp qword ptr [rdi], rax
sete al
ret
This appears to be related to LLVM not knowing whether the second field is poison, as Alive2 confirms that LLVM isn't allowed to convert the former into the latter (at least for the optimized forms): https://alive2.llvm.org/ce/z/bAsJGN
Is there maybe some metadata we could put on the parameter attributes to tell LLVM that reading them isn't poison? It appears that just reading them first, like (same godbolt link above)
#[no_mangle]
pub fn failed_workaround(x: &Entity, y: &Entity) -> bool {
let Entity { g: g1, i: i1 } = *x;
let Entity { g: g2, i: i2 } = *y;
g1 == g2 && i1 == i2
}
still isn't enough for it to remove the short-circuiting, as even though that emits the !noundef
loads first, it seems like LLVM's SROAPass
moves them behind the branch from &&
.
FWIW, clang(trunk) has the same codegen difference: https://cpp.godbolt.org/z/bbaz196GP
It might not have a choice, though, since C++ references are mostly just pointers.