Description
Benchmark program at the end. The results:
$ rustc -v
rustc 0.11-pre (cee9a83 2014-04-11 15:54:46 -0700)
host: x86_64-unknown-linux-gnu
$ rustc -O --test foo.rs && ./foo --bench
running 5 tests
test clone_owned ... bench: 5319322 ns/iter (+/- 166831)
test clone_owned_to_owned ... bench: 5293984 ns/iter (+/- 125331)
test clone_str ... bench: 85526 ns/iter (+/- 1333)
test clone_vec ... bench: 3332139 ns/iter (+/- 17227)
test test_memcpy ... bench: 85931 ns/iter (+/- 563)
test result: ok. 0 passed; 0 failed; 0 ignored; 5 measured
That comes out to 300 MB/s… really bad for a memory copy. I'm guessing this has to do with the fact that Vec<T>
is generic. Its clone
looks like
impl<T:Clone> Clone for Vec<T> {
fn clone(&self) -> Vec<T> {
self.iter().map(|x| x.clone()).collect()
}
}
and LLVM probably isn't smart enough to optimize this down to a memcpy
. But anyway, there is a need for efficient vectors of primitive numerical types. If this can't happen by optimization magic, we need something like
impl<T: POD> Vec<T> {
fn fast_clone(&self) -> Vec<T> {
let mut vector = Vec::with_capacity(self.len);
unsafe {
vector.set_len(self.len);
vector.copy_memory(self.as_slice());
}
vector
}
}
(untested). I can also imagine a language feature which would let you write
impl<T: Clone> Clone for Vec<T> {
fn clone(&self) -> Vec<T> {
if implements_trait!(T, POD) {
// ...
} else {
// ...
}
}
}
This is worryingly close to C++ template specialization, but might be worth it for core data structures. It's really counterintuitive if you need to use a special vector type or a special clone method to get acceptable performance on vectors of primitive integers.
Bonus weirdness: If you comment out clone_owned
then clone_owned_to_owned
gets significantly faster (though still way too slow):
running 4 tests
test clone_owned_to_owned ... bench: 3355442 ns/iter (+/- 69257)
test clone_str ... bench: 78866 ns/iter (+/- 5433)
test clone_vec ... bench: 3346685 ns/iter (+/- 134001)
test test_memcpy ... bench: 85116 ns/iter (+/- 3570)
If you comment out clone_owned_to_owned
instead, nothing in particular happens.
Here's the benchmark program:
extern crate test;
extern crate libc;
use test::{Bencher, black_box};
use libc::size_t;
use std::slice;
static size: uint = 1024*1024;
#[bench]
fn clone_str(bh: &mut Bencher) {
let mut x = StrBuf::with_capacity(size);
for _ in range(0, size) {
x.push_char('x');
}
let x: ~str = x.into_owned();
bh.iter(|| black_box(x.clone()));
}
#[bench]
fn clone_vec(bh: &mut Bencher) {
let mut x: Vec<u8> = Vec::with_capacity(size);
for _ in range(0, size) {
x.push(0x78);
}
bh.iter(|| black_box(x.clone()));
}
#[bench]
fn clone_owned(bh: &mut Bencher) {
let mut x: ~[u8] = slice::with_capacity(size);
for _ in range(0, size) {
x.push(0x78);
}
bh.iter(|| black_box(x.clone()));
}
#[bench]
fn clone_owned_to_owned(bh: &mut Bencher) {
let mut x: ~[u8] = slice::with_capacity(size);
for _ in range(0, size) {
x.push(0x78);
}
let y = x.to_owned();
bh.iter(|| black_box(y.clone()));
}
extern {
fn memcpy(dest: *mut u8, src: *u8, n: size_t);
}
#[bench]
fn test_memcpy(bh: &mut Bencher) {
let src = ~[0x78_u8, ..size];
let mut dst = ~[0_u8, ..size];
bh.iter(|| {
unsafe {
memcpy(dst.as_mut_ptr(), src.as_ptr(), size as u64);
}
})
}