Open
Description
I was looking for the best way to move all elements out of a Vec and returning them, basically what I'd write in C++ as:
struct A {
std::vector<int> moveOutOfVec() {
return std::move(myV); // this empties the vector
}
std::vector<int> myV;
};
std::vector<int> test(A& a) {
return a.moveOutOfVec();
}
This code generates very good assembly with clang (link to Godbolt)
mov rax, rdi
movups xmm0, xmmword ptr [rsi]
movups xmmword ptr [rdi], xmm0
mov rcx, qword ptr [rsi + 16]
mov qword ptr [rdi + 16], rcx
xorps xmm0, xmm0
movups xmmword ptr [rsi], xmm0
mov qword ptr [rsi + 16], 0
ret
so I tried to do the equivalent with Rust.
Code
NOTE: I always used rustc -O
for all the following tests.
I tried this code:
pub struct A {
v: Vec<i32>
}
impl A {
#[inline]
pub fn move_out_of_vec(&mut self) -> Vec<i32> {
let mut v = Vec::default();
std::mem::swap(&mut v, &mut self.v);
v
}
}
pub fn test(a: &mut A) -> Vec<i32> {
a.move_out_of_vec()
}
I expected to see similar assembly as C++.
Instead, this was the output: (link to Godbolt)
example::foo:
sub rsp, 24
mov rax, rdi
mov qword ptr [rdi], 4
xorps xmm0, xmm0
movups xmmword ptr [rdi + 8], xmm0
mov rcx, qword ptr [rdi + 16]
mov qword ptr [rsp + 16], rcx
mov rcx, qword ptr [rdi]
mov qword ptr [rsp], rcx
mov rcx, qword ptr [rdi + 8]
mov qword ptr [rsp + 8], rcx
mov rcx, qword ptr [rsi + 16]
mov qword ptr [rdi + 16], rcx
movups xmm0, xmmword ptr [rsi]
movups xmmword ptr [rdi], xmm0
mov rcx, qword ptr [rsp + 16]
mov qword ptr [rsi + 16], rcx
mov rcx, qword ptr [rsp]
mov qword ptr [rsi], rcx
mov rcx, qword ptr [rsp + 8]
mov qword ptr [rsi + 8], rcx
add rsp, 24
ret
Version it worked on
The same code in 1.44.0 generates much better assembly (only if you mark the method as inline
): (link to Godbolt)
example::foo:
sub rsp, 24
mov rax, rdi
mov rcx, qword ptr [rsi]
movups xmm0, xmmword ptr [rsi + 8]
movaps xmmword ptr [rsp], xmm0
mov qword ptr [rsi], 4
xorps xmm0, xmm0
movups xmmword ptr [rsi + 8], xmm0
mov qword ptr [rdi], rcx
movaps xmm0, xmmword ptr [rsp]
movups xmmword ptr [rdi + 8], xmm0
add rsp, 24
ret
Version with regression
From 1.45.0 until current nightly, the generated code is worse (inline
doesn't affect the output).
Notes
I also tried using drain
and split_off
to achieve the same thing, but they both generate far worse assembly, with jumps and all.