Description
We have closed various issues discussing validity invariants for simple types (integers, float, bool, char, thin raw pointers). I'd like to have somewhere to point for team consensus, such as an FCP in this issue. :)
We decide
that the validity invariants are
- integers, float, thin raw pointers, and
str
need to be initialized - bool needs to be 0 or 1
- char needs to be in
0..0xD800
or0xE000..0x110000
Transmuting any provenance-free input that satisfies the above requirements is definitely allowed. In particular, integers can be transmuted to raw pointers without causing immediate UB. What can be done with those pointers in terms of memory accesses is a different question and not answered here.
We do not decide what happens when the input has provenance. This is tracked here. In particular, values such as &0
(that have provenance) might or might not be legal to transmute to integers.
Rationale
- For the types with restricted range, we are using those ranges as niches for enum layout optimizations.
bool
andchar
have the same validity and safety invariant, which makes these types simpler to think about.char
can also be exploited by unicode algorithms, at least in principle. - Disallowing uninitialized values in integers is a prerequisite for optimizations that need integers to have a "stable" value (in LLVM terms: it lets us set
noundef
). For int, float, and thin raw pointers this choice also aligns the safety and validity invariant. str
is intended to behave like[u8]
when it comes to language UB, so its validity invariant is made consistent with that of integers.
Examples
The following pieces of code cause UB (as in, the UB arises when executing the code, not just potentially later):
let _val: i32 = MaybeUninit::uninit().assume_init();
let _val: bool = mem::transmute(2u8);
The following pieces of code are well-defined:
let val: bool = mem::transmute(1u8);
The following is not decided by this FCP:
let ptr = &0i32;
let ptr_to_ptr = addr_of!(ptr).cast::<usize>();
ptr_to_ptr.read(); // pointer-to-integer transmutation -- UB or not?
The following functions are sound (as in, safe code invoking these functions can never have UB):
fn to_bool(x: u8) -> Option<bool> {
if x < 2 { Some(mem::transmute(x)) } else { None }
}
fn from_bool(b: bool) -> u8 {
mem::transmute(b)
}
fn check_bool(b: bool) {
to_bool(from_bool(b)).unwrap_unchecked();
}
fn to_char(x: u32) -> Option<char> {
if (0..0xD800).contains(&x) || (0xE000..0x110000).contains(&x) {
Some(mem::transmute(x))
} else {
None
}
}
fn from_char(c: char) -> u32 {
mem::transmute(c)
}
fn check_char(c: char) {
to_char(from_char(c)).unwrap_unchecked();
}
fn to_ptr<T>(x: usize) -> *const T {
mem::transmute(x)
// We don't decie here what may be done with this pointer,
// but the transmute itself is fine and since safe code
// can't do anything with raw pointers, the function is even
// sound.
}