Skip to content

Decide on the validity invariant of integers, floats, bool, thin raw pointers, and char #439

Closed
@RalfJung

Description

@RalfJung

We have closed various issues discussing validity invariants for simple types (integers, float, bool, char, thin raw pointers). I'd like to have somewhere to point for team consensus, such as an FCP in this issue. :)

We decide

that the validity invariants are

  • integers, float, thin raw pointers, and str need to be initialized
  • bool needs to be 0 or 1
  • char needs to be in 0..0xD800 or 0xE000..0x110000

Transmuting any provenance-free input that satisfies the above requirements is definitely allowed. In particular, integers can be transmuted to raw pointers without causing immediate UB. What can be done with those pointers in terms of memory accesses is a different question and not answered here.

We do not decide what happens when the input has provenance. This is tracked here. In particular, values such as &0 (that have provenance) might or might not be legal to transmute to integers.

Rationale

  • For the types with restricted range, we are using those ranges as niches for enum layout optimizations. bool and char have the same validity and safety invariant, which makes these types simpler to think about. char can also be exploited by unicode algorithms, at least in principle.
  • Disallowing uninitialized values in integers is a prerequisite for optimizations that need integers to have a "stable" value (in LLVM terms: it lets us set noundef). For int, float, and thin raw pointers this choice also aligns the safety and validity invariant.
  • str is intended to behave like [u8] when it comes to language UB, so its validity invariant is made consistent with that of integers.

Examples

The following pieces of code cause UB (as in, the UB arises when executing the code, not just potentially later):

let _val: i32 = MaybeUninit::uninit().assume_init();
let _val: bool = mem::transmute(2u8);

The following pieces of code are well-defined:

let val: bool = mem::transmute(1u8);

The following is not decided by this FCP:

let ptr = &0i32;
let ptr_to_ptr = addr_of!(ptr).cast::<usize>();
ptr_to_ptr.read(); // pointer-to-integer transmutation -- UB or not?

The following functions are sound (as in, safe code invoking these functions can never have UB):

fn to_bool(x: u8) -> Option<bool> {
  if x < 2 { Some(mem::transmute(x)) } else { None }
}
fn from_bool(b: bool) -> u8 {
  mem::transmute(b)
}
fn check_bool(b: bool) {
  to_bool(from_bool(b)).unwrap_unchecked();
}

fn to_char(x: u32) -> Option<char> {
  if (0..0xD800).contains(&x) || (0xE000..0x110000).contains(&x)  {
    Some(mem::transmute(x))
  } else {
    None
  }
}
fn from_char(c: char) -> u32 {
  mem::transmute(c)
}
fn check_char(c: char) {
  to_char(from_char(c)).unwrap_unchecked();
}

fn to_ptr<T>(x: usize) -> *const T {
    mem::transmute(x)
    // We don't decie here what may be done with this pointer,
    // but the transmute itself is fine and since safe code
    // can't do anything with raw pointers, the function is even
    // sound.
}

Prior discussion

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions