Skip to content

Disarm mem::uninitialized by having it initialize to an arbitrary valid value for each type #87675

Closed
@bstrie

Description

@bstrie

For a while it has been understood that the mem::uninitialized API is broken. Originally the intuitive understanding of this API was that it produced a fixed, arbitrary value. However (as extensively discussed elsewhere) uninitialized memory is not a “fixed, arbitrary value”, and that for nearly all types in Rust it is instantaneous undefined behavior for them to be uninitialized.

What’s worse, even initialized values can be insta-UB. Rust uses its understanding of valid bit patterns to perform layout optimizations whereby invalid values can be repurposed as enum tags, which is how Option<&T> can be only a single word. Thus even mem::uninitialized’s sibling mem::zeroed is insta-UB when used with types like &T.

As a result, mem::uninitialized was deprecated and replaced with mem::MaybeUninit, which avoids the problems of the former. In addition, both mem::zeroed and mem::uninitialized were altered such that they will attempt to detect (and panic) when used on certain types: the former on types that must not be zero, and the latter on any types with invalid (defined) values.

However, implementing these panic checks caused a great deal of breakage (which arguably is desirable for safety, although still extremely disruptive), and to reduce disruption the check is conservative instead of exhaustive (#66151). Unfortunately, while improving the coverage of these checks will still leave mem::zeroed as perfectly usable, mem::uninitialized will be rendered all but unusable, as essentially all types cannot ever be in an uninitialized state.

This is a problem for legacy crates that were never migrated away from mem::uninitialized. However, there is a solution that both allows these legacy crates to compile while also avoiding the problem of invalid uninitialized values: mem::uninitialized can initialize with a valid value. This may seem contrary to the original intent of the API, but consider that the only reason to avoid initialization is performance, and that the choice is now between “my code doesn’t compile”, “my code contains undefined behavior”, and “my code is slower”; the latter is the most desirable outcome of the three.

 This raises the question: what value to initialize with? PR #87032 proposed the simplest option, which was to replace the innards of mem::uninitialized with mem::zeroed, however zero is the value that is most often used for niche optimizations, so this would still reject a lot of code.

But there is a more desirable alternative. Because Rust understands what values are invalid for a type—it must, in order to perform niche optimizations—it therefore should also understand which values are valid for a type. An intrinsic could be added to the compiler which, given a type, produces an arbitrary valid value of that type. This intrinsic could be used within mem::uninitialized, and the existing panic check could be removed. This would allow all code in the wild still using mem::uninitialized to compile, and would also avoid all insta-UB related to validity invariants.

Metadata

Metadata

Assignees

No one assigned

    Labels

    T-libs-apiRelevant to the library API team, which will review and decide on the PR/issue.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions