Skip to content

Cross language lto fails in presence of C++ destructor #141038

Open
@HKalbasi

Description

@HKalbasi

I'm working on a C++/Rust interop tool called Zngur, and when benchmarking it, I noticed this problem. Here is a reduced version without any Zngur related code.

This C++ code:

#include <cstdint>
#include <array>

extern "C" {
    void push_to_vec(void *v, uint64_t i);
    void new_vec_in_stack(void *v);
    void free_vec_in_stack(void *v);
}

struct MyVec {
    alignas(8) std::array<uint8_t, 24> data;

    MyVec() {
        new_vec_in_stack(reinterpret_cast<void*>(data.begin()));
    }

    // ~MyVec() {
    //     free_vec_in_stack(reinterpret_cast<void*>(data.begin()));
    // }
};

void build_vec(int n)
{
    MyVec v;
    void* vec = reinterpret_cast<void*>(v.data.begin());

    for (int i = 0; i < n; i++)
    {
        push_to_vec(vec, i);
    }

    free_vec_in_stack(vec);
}


extern "C" {
    void do_the_job()
    {
        for (int i = 0; i < 100000; i++)
        {
            build_vec(10000);
        }
    }
}

Becomes significantly (2x) slower if I use the destructor (commented out) instead of manually calling free_vec_in_stack at the end of build_vec function. Even when I add an empty destructor, it will become 2x slower. Marking the destructor as inline doesn't help.

Here is the Rust driver code:

use std::ffi::c_void;
use std::time::Instant;

#[unsafe(no_mangle)]
pub extern "C" fn new_vec_in_stack(v: *mut c_void) {
    unsafe {
        std::ptr::write(v as *mut Vec<u64>, vec![]);
    }
}

#[unsafe(no_mangle)]
pub extern "C" fn free_vec_in_stack(v: *mut c_void) {
    unsafe {
        _ = std::ptr::read(v as *mut Vec<u64>);
    }
}

#[unsafe(no_mangle)]
pub extern "C" fn push_to_vec(v: *mut c_void, i: u64) {
    let v = unsafe { &mut *(v as *mut Vec<u64>) };
    v.push(i);
}

unsafe extern "C" {
    fn do_the_job();
}

fn build_vec(n: u64) -> Vec<u64> {
    let mut r = vec![];
    for i in 0..n {
        r.push(i);
    }
    r
}

fn main() {
    let start = Instant::now();
    for _ in 0..100_000 {
        std::hint::black_box(build_vec(10000));
    }
    println!("Pure rust = {:?}", start.elapsed());

    let start = Instant::now();
    unsafe {
        do_the_job();
    }
    println!("Cross language = {:?}", start.elapsed());
}

Here is the result of the with destructor version:

Pure rust = 1.57105235s
Cross language = 3.138335498s

Here is the result of the without destructor version:

Pure rust = 1.633161618s
Cross language = 1.655836619s

And this one is the result of without destructor version, but when xlto is disabled:

Pure rust = 1.608407431s
Cross language = 3.019778757s

I enable xlto using this command:

cargo clean && CXX=clang++ RUSTFLAGS="-Clinker-plugin-lto -Clinker=clang -Clink-arg=-fuse-ld=lld" cargo run -r

And here is my build.rs file:

fn main() {
    cc::Build::new()
        .cpp(true)
        .file("job.cpp")
        .flag("-flto=thin")
        .compile("libjob.a");
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    LTOLink time optimization (regular/full LTO or ThinLTO)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions