Skip to content

Cloning a 1MB vector is 30x slower than cloning a 1MB ~str #13472

Closed
@kmcallister

Description

@kmcallister

Benchmark program at the end. The results:

$ rustc -v
rustc 0.11-pre (cee9a83 2014-04-11 15:54:46 -0700)
host: x86_64-unknown-linux-gnu

$ rustc -O --test foo.rs && ./foo --bench

running 5 tests
test clone_owned          ... bench:   5319322 ns/iter (+/- 166831)
test clone_owned_to_owned ... bench:   5293984 ns/iter (+/- 125331)
test clone_str            ... bench:     85526 ns/iter (+/- 1333)
test clone_vec            ... bench:   3332139 ns/iter (+/- 17227)
test test_memcpy          ... bench:     85931 ns/iter (+/- 563)

test result: ok. 0 passed; 0 failed; 0 ignored; 5 measured

That comes out to 300 MB/s… really bad for a memory copy. I'm guessing this has to do with the fact that Vec<T> is generic. Its clone looks like

impl<T:Clone> Clone for Vec<T> {
    fn clone(&self) -> Vec<T> {
        self.iter().map(|x| x.clone()).collect()
    }
}

and LLVM probably isn't smart enough to optimize this down to a memcpy. But anyway, there is a need for efficient vectors of primitive numerical types. If this can't happen by optimization magic, we need something like

impl<T: POD> Vec<T> {
    fn fast_clone(&self) -> Vec<T> {
        let mut vector = Vec::with_capacity(self.len);
        unsafe {
            vector.set_len(self.len);
            vector.copy_memory(self.as_slice());
        }
        vector
    }
}

(untested). I can also imagine a language feature which would let you write

impl<T: Clone> Clone for Vec<T> {
    fn clone(&self) -> Vec<T> {
        if implements_trait!(T, POD) {
            // ...
        } else {
            // ...
        }
    }
}

This is worryingly close to C++ template specialization, but might be worth it for core data structures. It's really counterintuitive if you need to use a special vector type or a special clone method to get acceptable performance on vectors of primitive integers.

Bonus weirdness: If you comment out clone_owned then clone_owned_to_owned gets significantly faster (though still way too slow):

running 4 tests
test clone_owned_to_owned ... bench:   3355442 ns/iter (+/- 69257)
test clone_str            ... bench:     78866 ns/iter (+/- 5433)
test clone_vec            ... bench:   3346685 ns/iter (+/- 134001)
test test_memcpy          ... bench:     85116 ns/iter (+/- 3570)

If you comment out clone_owned_to_owned instead, nothing in particular happens.

Here's the benchmark program:

extern crate test;
extern crate libc;

use test::{Bencher, black_box};
use libc::size_t;
use std::slice;

static size: uint = 1024*1024;

#[bench]
fn clone_str(bh: &mut Bencher) {
    let mut x = StrBuf::with_capacity(size);
    for _ in range(0, size) {
        x.push_char('x');
    }
    let x: ~str = x.into_owned();
    bh.iter(|| black_box(x.clone()));
}

#[bench]
fn clone_vec(bh: &mut Bencher) {
    let mut x: Vec<u8> = Vec::with_capacity(size);
    for _ in range(0, size) {
        x.push(0x78);
    }
    bh.iter(|| black_box(x.clone()));
}

#[bench]
fn clone_owned(bh: &mut Bencher) {
    let mut x: ~[u8] = slice::with_capacity(size);
    for _ in range(0, size) {
        x.push(0x78);
    }
    bh.iter(|| black_box(x.clone()));
}

#[bench]
fn clone_owned_to_owned(bh: &mut Bencher) {
    let mut x: ~[u8] = slice::with_capacity(size);
    for _ in range(0, size) {
        x.push(0x78);
    }
    let y = x.to_owned();
    bh.iter(|| black_box(y.clone()));
}

extern {
    fn memcpy(dest: *mut u8, src: *u8, n: size_t);
}

#[bench]
fn test_memcpy(bh: &mut Bencher) {
    let src = ~[0x78_u8, ..size];
    let mut dst = ~[0_u8, ..size];
    bh.iter(|| {
        unsafe {
            memcpy(dst.as_mut_ptr(), src.as_ptr(), size as u64);
        }
    })
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    I-slowIssue: Problems and improvements with respect to performance of generated code.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions