suboptimal codegen when initializing an array of padded structs #122274
Description
in the following code, reset_implicit_padding
generates somewhat worse assembly under -C opt-level=3
than reset_explicit_padding
. with explicit padding generates to a memset
- i'd hoped both would memset
:
#[repr(Rust, align(2))]
#[derive(Default, Copy, Clone)]
pub struct ImplicitPadding {
v: u8,
}
#[repr(Rust, align(2))]
#[derive(Default, Copy, Clone)]
pub struct ExplicitPadding {
v: u8,
_pad: [u8; 1],
}
#[no_mangle]
pub fn reset_implicit_padding(array: &mut [ImplicitPadding; 160]) {
*array = [ImplicitPadding::default(); 160];
}
#[no_mangle]
pub fn reset_explicit_padding(array: &mut [ExplicitPadding; 160]) {
*array = [ExplicitPadding::default(); 160];
}
which compiles to these two implementations:
reset_implicit_padding:
mov eax, 7
.LBB0_1:
mov byte ptr [rdi + 2*rax - 14], 0
mov byte ptr [rdi + 2*rax - 12], 0
mov byte ptr [rdi + 2*rax - 10], 0
mov byte ptr [rdi + 2*rax - 8], 0
mov byte ptr [rdi + 2*rax - 6], 0
mov byte ptr [rdi + 2*rax - 4], 0
mov byte ptr [rdi + 2*rax - 2], 0
mov byte ptr [rdi + 2*rax], 0
add rax, 8
cmp rax, 167
jne .LBB0_1
ret
reset_explicit_padding:
mov edx, 320
xor esi, esi
jmp qword ptr [rip + memset@GOTPCREL]
godbolt for reference.
it turns out that when an array like this does become a memset it's seemingly often because llvm realized in loop-idiom that the stores can be turned into a memset, rather than ever being emitted as a memset from rustc?
i don't know enough to know if this would need rustc to initialize padding bytes as undef (?), or if they're already undef and there's some other reason llvm doesn't figure out this could be a memset. i also can't tell if this would be in conflict with this issue about padding copies or the regression test for it. there are certainly cases where copying large amounts of padding would be counterproductive, though if writing a padding byte could turn 4+2+1 bytes of stores into one 8-byte store, that's probably always a net win.
(the original code i've minimized had 6-byte or 14-byte structs aligned to 8 and 16 respectively, and for entirely indecipherable reasons also got the assignment fully unrolled, yielding ~600 mov
rather than a memset. i would much rather that be a call to memset, and have added a _pad
field for the time being.)
Activity