Skip to content

Safe memory zeroing #2626

Open
Open
@gnzlbg

Description

@gnzlbg

(note: I use the terms safe and valid below with the precise meanings specified here: https://github.com/rust-rfcs/unsafe-code-guidelines/blob/master/reference/src/glossary.md#validity-and-safety-invariant)

Motivation

See rust-lang/rust#53491. In a nutshell: std::mem::zeroed() is dangerous - running this on miri (playground):

fn main() {
    let _x: &'static str = unsafe { 
        std::mem::zeroed() 
    };  // Probably instant UB
}

produces an error of the form: "type validation failed: encountered NULL reference". Obviously, in the real world, code like this will be caught in code review, but catching this stops being easy when one has a struct with multiple fields and one has to manually verify that all fields in the struct are valid when all its bits are zero. Layer a couple of user-defined types on top of each other, and a small private change to one of them down the stack can easily make code using mem::zeroed have instant undefined behavior.

When this happens, right now, having a test suite and running it on miri is the only way we have to detect that. However, C FFI is one of the main usages of mem::zeroed and miri has very limited support for that. So even if you have a good test suite, miri won't help you here.

This RFC provides a solution that catches these errors at compile-time, allowing type users to zero-initialize types using safe Rust code reliably and allowing type authors to specify that doing this is a part of the type's API that they are committed to support (where changing this would be an API breaking change).

User-level explanation

Alternative: Zeroed trait like Default

(note: the trait name Zeroed is yet to be bikeshedded - I think it would be better to agree on the approach and the semantics, and when that has consensus, we can bikeshed the name at the end).

(note: I think I prefer the marker trait + const fn approach explained below)

Add a std::zeroed::Zeroed trait, similar to Default, that denotes that the zero bit-pattern is a valid bit-pattern of the type and that this bit-pattern is safe to use. This trait is unsafe to implement - implementing it for &T would make Zeroed::zeroed have undefined behavior.

// in libcore:
mod zeroed {
    /// A trait for types whose all-zeros bit-pattern is valid and safe.
    pub unsafe trait Zeroed {
        /// Instantiates a value with all bytes equal to zero.
        fn zeroed() -> Self where Self: Sized {
            unsafe { mem::MaybeUninit::<Self>::zeroed().into_inner() }
        }
    }
}

Implement Zeroed in core for all libcore types for which this is the case: integers, raw pointers, etc. - do not implement it for references, NonZero{...}, etc.

Add a custom-derive Zeroed that can be used to manually derive this trait for user-defined types without using unsafe Rust (e.g. if all the fields of a struct implement Zeroed). If the struct cannot derive Zeroed that should produce a compile-time error. Whether the all-zeros bit-pattern is valid and safe bit-pattern for a type is an API contract from the writer of the type to its users. This is why manually specifying it instead of using an auto trait feels like a better solution to the problem.

/// A type that is valid to zero-initialize, 
/// but not safe - this type does not derive Zeroed
struct Foo(u32);
impl Foo {
    pub fn new() -> Self { Self(1) }
    pub fn foo(&self) -> NonZeroU32 {
        // If this type was Zeroed, safe Rust code could
        // invoke undefined behavior
        unsafe { NonZeroU32::new_unchecked(self.0) }
    }
}

/// A type that is valid and safe to zero-initialize
#[derive(Zeroed)]
struct Bar(u32);
impl Bar {
    // bar is unsafe because the type can be zeroed
    // (Safety: call me only if self.0 != 0)
    pub unsafe fn bar(&self) -> NonZeroU32 {
        unsafe { NonZeroU32::new_unchecked(self.0) }
    }
    pub fn bar2(&self) -> NonZeroU32 {
        NonZeroU32::new(self.0) // panics if self.0 == 0
    }
}

/// This produces a compilation error, since this type is not valid to zero initialize
#[derive(Zeroed)]
struct Baz(u32, &'static str);
// ERROR: self.0 is not Zeroed

To upgrade code that previously was using mem::zeroed() to Zeroed, one changes:

let x: Foo = unsafe { mem::zeroed() };

to

// potentially adding a: use std::zeroed::Zeroed;
let x = Foo::zeroed();

We should probably add Zeroed to the std::prelude::v1.

After this change we can deprecate std::mem::zeroed with a deprecation warning "use std::zeroed::Zeroed instead".

An RFC for this feature would probably leave this as an unresolved question, but we probably should turn that deprecation message into an error in the next edition. That is, for crates using edition = rust2021 using std::mem::zeroed should error with the deprecation message instead. That is, libcore will contain mem::zeroed forever, so that Rust code using older editions can still use it, but we probably want to add a mechanism to ban using it from code that decides to use a newer edition.

Alternatives

auto trait

We could make Zeroed an auto-trait, but then I don't see how it could be used to denote that a zeroed value is safe to use - we could still use it to denote that the value is valid. This has two consequences:

  • zeroinit would need to be unsafe, since the resulting value might not be safe to use. That is, just because the zero bit pattern does not cause undefined behavior instantaneously does not imply that safe methods on the type might not all have a pre-condition that the bit-pattern is not all zeros. The user of the type might not want to provide a way to safely construct a value with such a bit-pattern, and it would be bad for the user to have to opt-out this auto trait to maintain safety.

  • Being able to zero-initialize a type is something that users of the type should be able to rely on. Once the type author commits to providing this API, it should be at least automatically noticeable when a change in the type breaks this API. With the non-auto trait + derive this happens automatically. With an auto-trait type authors would need to add a test for this (e.g. fn foo() -> T { Zeroed::zeroed() } or similar).

marker trait

We could also make Zeroed a marker trait, and have some function like:

const fn zeroed2<T: marker::Zeroed>() -> T {
    unsafe { mem::MaybeUninit::<T>::zeroed().into_inner() }
}

this approach has the advantage that zeroed2 is a const fn. The disadvantage is that we can't call this function mem::zeroed because such a function already exist, and we'd have to either put it somewhere else, or call it somewhere else (e.g. mem::zeroinit? ).

There is an RFC (rust-lang/const-eval#8) that would solve this problem by allowing us to:

pub trait Zeroed {
    #[default_method_body_is_const]
    fn zeroed() -> Self where Self: Sized { ... }
}

to indicate that the default method impl is const, and then allowing the Zeroed derive to perform a:

impl const Zeroed for $id { ... }

to add a const impl. However, the Zeroed::zeroed trait+trait method approach gives users the flexibility of adding their own zeroed implementations, and this is not a flexibility that I do think that we want. Paying for this flexibility might not be a good idea.

The simplicity of an unsafe to implement marker trait + a const fn somewhere in libcore is definitely appealing.

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-unsafeUnsafe related proposals & ideas

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions