Description
openedon Mar 5, 2024
As part of google/zerocopy#251, we're trying to teach zerocopy to reason in more precise ways about UnsafeCell
s. I'm making this issue as sort of an omnibus issue to clarify a number of questions regarding UnsafeCell
s. As always with zerocopy, I'm only interested in what can currently be promised, not what might be promised in the future.
In particular, my questions concern when it's sound for different code to disagree on whether a given memory region is covered by an UnsafeCell
.
Constraints
Based on Miri's behavior and APIs that already exist in the standard library, we can observe some constraints on what code is sound/unsound. In particular:
- Any time Miri treats code as unsound, I take that to mean that we cannot guarantee that it is sound even if it might be decided to be sound at some point in the future
- Just because Miri treats code as sound, it may not be guaranteed to be sound
- If a stable API exists which implies that code is sound, then it probably is guaranteed to be sound
It's unsound (under stacked borrows) to have multiple active references which "disagree" about whether a memory region is covered by an UnsafeCell
Status: can't assume sound
As soon as there are multiple references which are "active"* or "not shadowed" or whatever the proper terminology is, and these references disagree about whether a given memory region is covered by an UnsafeCell
, this is insta-UB according to stacked borrows even if the references are never used for anything. For example:
let u = 1usize;
let u_ref = &u;
let _c_ref = unsafe { &*(u_ref as *const usize as *const Cell<usize>) }; // Insta-UB
*Previously, this post used the term "live", but @CAD97 pointed out that "active" is the more accurate term.
Miri output
error: Undefined Behavior: trying to retag from <1716> for SharedReadWrite permission at alloc809[0x0], but that tag only grants SharedReadOnly permission for this location
--> src/main.rs:6:27
|
6 | let _c_ref = unsafe { &*(u_ref as *const usize as *const Cell<usize>) };
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| |
| trying to retag from <1716> for SharedReadWrite permission at alloc809[0x0], but that tag only grants SharedReadOnly permission for this location
| this error occurs as part of retag at alloc809[0x0..0x8]
|
= help: this indicates a potential bug in the program: it performed an invalid operation, but the Stacked Borrows rules it violated are still experimental
= help: see https://github.com/rust-lang/unsafe-code-guidelines/blob/master/wip/stacked-borrows.md for further information
help: <1716> was created by a SharedReadOnly retag at offsets [0x0..0x8]
--> src/main.rs:6:30
|
6 | let _c_ref = unsafe { &*(u_ref as *const usize as *const Cell<usize>) };
| ^^^^^
= note: BACKTRACE (of the first span):
= note: inside `main` at src/main.rs:6:27: 6:74
It's still unsound even if the overlap is only "partial"
Status: can't assume sound
In this example, only some of the bytes go from covered by an UnsafeCell
to not covered.
#[repr(C)]
struct Foo(Cell<u32>, u32);
let u = 1u64;
let u_ref = &u;
let _c_ref = unsafe { &*(u_ref as *const u64 as *const Foo) }; // Insta-UB
Miri output
error: Undefined Behavior: trying to retag from <1716> for SharedReadWrite permission at alloc809[0x0], but that tag only grants SharedReadOnly permission for this location
--> src/main.rs:9:27
|
9 | let _c_ref = unsafe { &*(u_ref as *const u64 as *const Foo) };
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| |
| trying to retag from <1716> for SharedReadWrite permission at alloc809[0x0], but that tag only grants SharedReadOnly permission for this location
| this error occurs as part of retag at alloc809[0x0..0x8]
|
= help: this indicates a potential bug in the program: it performed an invalid operation, but the Stacked Borrows rules it violated are still experimental
= help: see https://github.com/rust-lang/unsafe-code-guidelines/blob/master/wip/stacked-borrows.md for further information
help: <1716> was created by a SharedReadOnly retag at offsets [0x0..0x8]
--> src/main.rs:9:30
|
9 | let _c_ref = unsafe { &*(u_ref as *const u64 as *const Foo) };
| ^^^^^
= note: BACKTRACE (of the first span):
= note: inside `main` at src/main.rs:9:27: 9:64
It's no longer unsound if the UnsafeCell
region is of zero size
Status: might be able to guarantee sound
In other words, it's not the mere presence of UnsafeCell
that matters, but rather the bytes that are "covered" by UnsafeCell
s. For this reason, ZST UnsafeCell
s have no effect; this code is accepted by Miri:
// `Cell<()>` is a ZST
#[repr(C)]
struct Foo(Cell<()>, u64);
let u = 1u64;
let u_ref = &u;
let _c_ref = unsafe { &*(u_ref as *const u64 as *const Foo) };
It is sound to have multiple references disagree about what type a memory region has so long as those references agree on UnsafeCell
s
Status: probably guaranteed sound
For example, in both of the following cases, while the types are not the same, the byte ranges covered by UnsafeCell
s are the same.
let u = 1usize;
let u_ref = &u;
// Convert from `&usize` to `&isize`. Neither type has `UnsafeCell`s.
let _i_ref = unsafe { &*(u_ref as *const usize as *const isize) };
let cu = Cell::new(1usize);
let cu_ref = &cu;
// Convert from `&Cell<usize>` to `&Cell<isize>`. Both types have
// `UnsafeCell`s covering the same byte ranges.
let _ci_ref = unsafe { &*(cu_ref as *const Cell<usize> as *const Cell<isize>) };
It is sound to have multiple references disagree about UnsafeCell
s so long as only one of them is "active"
Status: guaranteed sound
In this example, I believe that what's happening is that u_mut
becomes inaccessible so long as _c_mut
exists, so there's no way to exercise the UnsafeCell
disagreement. Not only is this accepted by Miri in practice, it's also codified in the UnsafeCell
methods get_mut
(stable) and from_mut
(unstable nightly feature).
let mut u = 1usize;
let u_mut = &mut u;
let _c_mut = unsafe { &mut *(u_mut as *mut usize as *mut Cell<usize>) };
It is sound to read a non-UnsafeCell
location as an UnsafeCell
via ptr::read
Status: unclear
I'm not sure how to think about why this is sound, and whether it's guaranteed to remain sound.
let u = 1usize;
let u_ref = &u;
let _c = unsafe { ptr::read(u_ref as *const usize as *const Cell<usize>) };
It is sound to read an UnsafeCell
location as a non-UnsafeCell
via ptr::read
Status: unclear
I'm even less sure how to think about this one - my intuition tells me that Miri should reject this, but it doesn't.
let c = Cell::new(1usize);
let c_ref = &c;
let _u = unsafe { ptr::read(c_ref as *const Cell<usize> as *const usize) };
Guarantees
I would like to be able to rely on the following guarantees. For each guarantee, my questions are:
- Is this already documented somewhere as being guaranteed?
- If not, would it be acceptable (at least to T-opsem) to add documentation to guarantee it?
Sound given UnsafeCell
agreement
In particular: it is sound to create references to the same memory which disagree about type so long as they agree about which bytes are covered by UnsafeCell
. It is sound to use such references to perform loads and stores (assuming those loads and stores do not violate the referent types' bit validity).
These analyses only consider active references
For the purposes of the previous soundness guarantee, only "active" (i.e., non-shadowed) references are considered.* This allows us to prove that code which reborrows a &mut T
(with no UnsafeCell
s) as a &mut U
(with UnsafeCell
s) or vice versa (e.g., UnsafeCell::get_mut
) is sound.
I am aware that the formal definition of "active-ness" is up in the air, and subject to models like stacked borrows and tree borrows. I am assuming that there is some way we can formally define borrows so that we can get some minimum guarantees about this section despite not having completely agreed on the full model.
It is sound to read a non-UnsafeCell
location as an UnsafeCell
via ptr::read
Given &T
where T
doesn't contain any UnsafeCell
s, it is sound to convert &T
to *const U
(where U
does contain UnsafeCell
s) and use ptr::read
to read a U
out of that memory location (assuming that this doesn't violate U
's bit validity, read uninitialized bytes, read unaligned memory, etc etc).