Skip to content

Improve LLVM optimization across function calls #21465

Closed
@mahkoh

Description

@mahkoh

Vec<T> has an extend function which is equivalent to push_all except that it accepts an iterator. push_all optimizes to memcpy but extend does not. Consider the following unsafe Iterator:

struct MyIntoIter<T> {
    ptr: *const T,
    end: *const T
}

impl<T> MyIntoIter<T> {
    fn from_vec(mut v: Vec<T>) -> MyIntoIter<T> {
        unsafe {
            let ptr = v.as_mut_ptr();
            let begin = ptr as *const T;
            let end = if mem::size_of::<T>() == 0 {
                (ptr as usize + v.len()) as *const T
            } else {
                ptr.offset(v.len() as isize) as *const T
            };
            mem::forget(v);
            MyIntoIter { ptr: begin, end: end }
        }
    }
}

trait RawIterator {
    type Item;

    fn next(&mut self) -> Self::Item;
    fn ptr(&self) -> *const Self::Item;
    fn len(&self) -> usize;
}

impl<T> RawIterator for MyIntoIter<T> {
    type Item = T;

    fn next(&mut self) -> T {
        unsafe {
            let old = self.ptr;
            self.ptr = self.ptr.offset(1);
            ptr::read(old)
        }
    }

    fn ptr(&self) -> *const T {
        self.ptr
    }

    fn len(&self) -> usize {
        (self.end as usize - self.ptr as usize) / mem::size_of::<T>()
    }
}

This has all safety features removed. The user is responsible for not calling next when there are no more elements. One can assume that the normal Iterator will never optimize better than this Iterator.

Consider the following variants of extend:

#![allow(unstable)]

use std::{ptr, mem};

#[inline(never)]
fn extend4<T, I: RawIterator<Item=T>>(vec: &mut Vec<T>, mut iter: I) {
    let len = iter.len();
    vec.reserve(len);
    unsafe {
        let mut ptr = vec.as_mut_ptr().offset(vec.len() as isize);
        let end = ptr.offset(len as isize);
        while ptr != end {
            ptr::write(ptr, iter.next());
            ptr = ptr.offset(1);
        }
    }
}

#[inline(never)]
fn extend5<T, I: RawIterator<Item=T>>(vec: &mut Vec<T>, mut iter: I) {
    let len = iter.len();
    vec.reserve(len);
    unsafe {
        let mut dst_ptr = vec.as_mut_ptr().offset(vec.len() as isize);
        let dst_end = dst_ptr.offset(len as isize);
        let mut src_ptr = iter.ptr();
        while dst_ptr != dst_end {
            ptr::write(dst_ptr, ptr::read(src_ptr));
            dst_ptr = dst_ptr.offset(1);
            src_ptr = src_ptr.offset(1);
        }
    }
}

fn main() {
    let src: Vec<_> = (0..6400000us).map(|x| x as u8).collect();
    let mut x: Vec<u8> = vec!();

    //extend4(&mut x, MyIntoIter::from_vec(src));
    extend5(&mut x, MyIntoIter::from_vec(src));
}

You can see that extend5 is nothing but extend4 with next inlined manually. However, extend5 optimizes to a memcpy while extend4 does not.

From this we can see that adding an unsafe ExactSizeIterator trait will not improve the extend performance on its own.

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-LLVMArea: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.A-codegenArea: Code generationI-slowIssue: Problems and improvements with respect to performance of generated code.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions