Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

memset()/memclr() emitted for Vec::new() #305

Closed
jordens opened this issue Jul 24, 2022 · 2 comments
Closed

memset()/memclr() emitted for Vec::new() #305

jordens opened this issue Jul 24, 2022 · 2 comments

Comments

@jordens
Copy link
Contributor

jordens commented Jul 24, 2022

memclr()/memset() calls can be emitted, e.g. if Vec is contained in another structure (even if its position is last):

use heapless::Vec;

pub struct S {
    x: u32,
    v: Vec<u8, 28>,
}

pub fn s2() -> S {
    S { x: 0, v: Vec::new() }
}
example::s2:
        push    {r7, lr}
        mov     r7, sp
        movs    r1, #36
        bl      __aeabi_memclr4
        pop     {r7, pc}

Godbolt

Complete code to reproduce with a couple variations
#![no_std]
#![allow(unused)]

// Vec taken (minus comments) from
// https://github.com/japaric/heapless/blob/14639156894125d75b908405cc228eb4b550c85d/src/vec.rs#L36-L72
use core::mem::MaybeUninit;

pub struct Vec<T, const N: usize> {
    len: usize,
    buffer: [MaybeUninit<T>; N],
}

impl<T, const N: usize> Vec<T, N> {
    const ELEM: MaybeUninit<T> = MaybeUninit::uninit();
    const INIT: [MaybeUninit<T>; N] = [Self::ELEM; N];

    pub const fn new() -> Self {
        Self {
            len: 0,
            buffer: Self::INIT,
        }
    }

}

pub struct S {
    x: u32,
    v: Vec<u8, 28>,
}

pub struct P {
    x: u8,
    v: Vec<u8, 32>,
}

/// This is the only case (of the four shown below) that does not clear/set the entire Vec
pub fn s1() -> S {
    S { x: 1, v: Vec::new() }
}

pub fn s2() -> S {
    S { x: 0, v: Vec::new() }
}

pub fn p1() -> P {
    P { x: 1, v: Vec::new() }
}

pub fn p2() -> P {
    P { x: 0, v: Vec::new() }
}

Result:

example::s1:
        movs    r1, #0
        movs    r2, #1
        strd    r2, r1, [r0]
        bx      lr

example::s2:
        push    {r7, lr}
        mov     r7, sp
        movs    r1, #36
        bl      __aeabi_memclr4
        pop     {r7, pc}

example::p1:
        push    {r4, r6, r7, lr}
        add     r7, sp, #8
        mov     r4, r0
        adds    r0, #4
        movs    r1, #33
        movs    r2, #1
        bl      __aeabi_memset4
        movs    r0, #0
        str     r0, [r4]
        pop     {r4, r6, r7, pc}

example::p2:
        push    {r7, lr}
        mov     r7, sp
        movs    r1, #37
        bl      __aeabi_memclr4
        pop     {r7, pc}

When hunting down a few unexpected and expensive memclr()/memset() I noticed that they are generated for Vec::new().
This is even though Vec almost exclusively consists of MaybeUninit::uninit() and also specifically tunes its layout so that the only initialized element (the length) is early. The optimization mentioned in the code comment to #290 does appear to be very desirable but also very easy to miss out on.

From a cursory search the result is the same for all rust versions between current nightly and 1.56. With 1.55 all four cases receive memset()/memclr() treatment. This is -Copt-level=3 --target=thumbv7em-none-eabihf. I didn't try other targets or optimization levels.

This is probably not a bug in heapless, and there may not even be a viable way to improve the situation even from the rust/llvm perspective. But it looks like a surprising data point and was certainly a pitfall in my case.

Under what circumstances can the compiler be expected to actually forego the initialization of MaybeUninit::uninit()?
Or are we using it (Vec or MaybeUninit) wrong?

jordens added a commit to quartiq/stabilizer that referenced this issue Jul 24, 2022
jordens added a commit to quartiq/stabilizer that referenced this issue Jul 24, 2022
bors bot added a commit to quartiq/stabilizer that referenced this issue Jul 25, 2022
566: pounder: tune initialization r=ryan-summers a=jordens

rust-embedded/heapless#305


Co-authored-by: Robert Jördens <rj@quartiq.de>
@jordens
Copy link
Contributor Author

jordens commented Jul 25, 2022

Related:

They appear to indicate that this is indeed an issue in LLVM that is likely resolved in LLVM 15.

@jordens jordens closed this as completed Jul 25, 2022
@jordens
Copy link
Contributor Author

jordens commented Aug 16, 2022

Current nightly uses llvm 15 and does not generate the memset/memclrs anymore.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant