-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Unsafe fields #3458
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Unsafe fields #3458
Conversation
text/0000-unsafe-fields.md
Outdated
|
||
// Unsafe field initialization requires an `unsafe` block. | ||
// Safety: `unsafe_field` is odd. | ||
let mut foo = unsafe { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m not sure I like that the entire struct expression is now inside an unsafe block (though I’m not sure what a better syntax would be). If the safety invariant requires the entire struct, this makes sense. However, if it is more specific, a larger unsafe block is too broad as it also allows unsafe code usage to initialise the safe fields, which should get their own unsafe blocks. Though perhaps this could just be linted against, e.g. don’t use unsafe expressions in a struct initialiser without putting them inside nested unsafe blocks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The alternative is to do this:
let mut foo = Foo {
safe_field: 0,
unsafe_field: unsafe { 1 },
};
That feels worse to me.
Presumably the safety invariant being promised is an invariant within the struct, even if it is not strictly required that that be the case.
Ideally we'd also have partial initialization, such that it would be possible to do
let mut foo = Foo { safe_field: 0 };
unsafe { foo.unsafe_field = 1; }
but I expect that's quite a bit farther away.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that just unsafe in one field initialiser expression is insufficient as it means something very different. I do like however that it clearly shows which field the unsafety applies to.
It reminds me a bit of unsafe blocks in unsafe functions (#2585). Ideally, the fact that struct init is unsafe would not allow unsafe field init expressions. I doubt that special-casing struct inits to not propagate outer unsafe blocks would be backwards compatible, so a new lint would be the only avenue in this direction.
Some new syntax like this
let mut foo = unsafe Foo {
safe_field: 0,
unsafe unsafe_field: 1,
};
would communicate intent better but looks quite unnatural to me.
Perhaps in the future there might be explicit safe blocks to reassert a safe context within an unsafe block, so that it could be written as follows:
let mut foo = unsafe {
Foo {
safe_field: safe { /* some more complex expr here */ },
unsafe_field: safe { /* some more complex expr here */ },
}
};
which could be encouraged with clippy lints. Though for now just moving safe non-trivial struct field initialisers into variables that are initialised outside the unsafe block.
Overall, I think going with the original syntax of
let mut foo = unsafe {
Foo {
safe_field: 2,
unsafe_field: 1,
}
};
looks like a good and intuitive solution, though I still think that a lint against using unsafe expressions inside the struct expression without a nested unsafe block would still help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this?
let mut foo = unsafe Foo {
safe_field: 0,
unsafe_field: 1,
};
unsafe
would be followed immediately by a struct expression, and doing so would only indicate that unsafe fields may be initialized. It would not introduce an unsafe context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like it - the syntax is still close enough to the unsafe block syntax (except when initialising tuple structs) that it's relatively intuitive what the unsafety applies to, but coupled strongly to the struct name so that it also makes sense why no unsafe context is introduced.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's something nice about
let mut foo = unsafe Foo {
safe_field: 0,
unsafe_field: 1,
};
but to me that would go with unsafe struct Foo
, like how unsafe trait Bar
leads to unsafe impl Bar
.
Spitballing: if per-field safety is really needed, then maybe
let mut foo = Foo {
safe_field: 0,
unsafe unsafe_field: 1,
};
though maybe the answer is the same as for unsafe { unsafe_fn_call(complicated_expr) }
: if you don't want the expr in the block, use a let
.
After all, there's always field shorthand available, so you can do
let safe_field = unsafe { complex to construct };
let unsafe_field = easy and safe to construct;
unsafe { Foo { safe_field, unsafe_field } }
Perhaps the more important factor would be the workflow for the errors the programmer gets when changing a (internal and thus not semver break) field to unsafe
. Anything at the struct level wouldn't give new "hey, you need unsafe
here" if you already had another field that was unsafe.
Of course, if the safety is at the struct level (not the field level) then that doesn't come up.
Hmm, the "you added an additional requirement to something that's already in an unsafe block and there's no way to help you make sure you handed that" is a pre-existing problem that needs tooling for that everywhere, so maybe it's not something this RFC needs to think about.
If we had unsafe(aligned(a), nonzero(b)) { ... }
so that tooling could help ensure that people thought about everything declared in the safety section, then we'd just have unsafe(initialized(ptr, len)) { ... }
so that it acknowledged the discharge of the obligation for the type invariant, and the "I need 100 unsafe blocks" problem goes away.
I think the motivation could point out who would benefit from this. I assume it's library authors a making unsafe fields a "reminder to self" about upholding some invariant as opposed to say expecting a unsafe field in an API surface. I.e we still expect |
this seems closely related to |
Not necessarily. It is entirely reasonable that
Related, sure, but beyond the mention in unresolved questions, I'm not sure how it could be mentioned. Pretty much the only overlap is what's considered a "mutable access", which I didn't feel necessary to be copy-pasted. |
How does this look with functional struct update?
? Edit: or I guess it doesn't need unsafe if the source had been initialized with unsafe. |
@jsgf Great question. I don't have an immediate answer, though I believe the mechanism currently in the compiler would require the entire initializer to be unsafe. |
I love it! This is much better than getters and setters, both for library authors and users. typed-builder would have to adjust, but I would love to implement this feature there :) @idanarye (the main author) Maybe you have some input regarding the builder pattern? |
The implicit notion that only mutation is unsafe (and reading is not) seems tricky. Do you have reasoning to prove that we'll never need fields that are unsafe to read? Maybe there should be an alternative syntax proposal (like |
@mo8it I don't want to spam the comments here with a discussion about typed builder, so I've opened a discussion in my repository instead: idanarye/rust-typed-builder#103 |
Regarding the RFC itself - I think you are trying to solve a visibility issue with the safety mechanism, which is the wrong tool for the job. You gave But why? I mean, it's obvious why you don't want to give regular write access to I'm aware of the Another thing - conceptually it never makes sense to define only one field as let mut vec = Vec::new();
vec.len = 4; // UNSAFE!!! The why not this? let mut vec = Vec::from([1, 2, 3, 4]);
vec.buf = RawVec::new(); // perfectly safe apparently Yes, Whatever the semantics of unsafe fields will be - conceptually it makes more sense to put the |
text/0000-unsafe-fields.md
Outdated
## "Mutable use" in the compiler | ||
|
||
The concept of a "mutable use" [already exists][mutating use] within the compiler. This catches all | ||
situations that are relevant here, including `ptr::addr_of_mut!`, `&mut`, and direct assignment to a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One could argue that ptr::addr_of_mut!
on an unsafe
field need not be unsafe, because writes through the pointer are unsafe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider this in conjunction with the examples present in the RFC.
fn make_zero(p: *mut u8) {
unsafe { *p = 0; }
}
let p = ptr::addr_of_mut!(foo.unsafe_field);
make_zero(p);
Ignoring thread-safety, which can be easily achieved with locks, no single step appears wrong. make_zero
does not do anything wrong — assigning zero to an arbitrary *mut u8
is fine. Passing a pointer to the method is naturally okay. Yet it still results in unsoundness, as foo.unsafe_field
must be odd.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"assigning zero to an arbitrary *mut u8
is fine" what‽ No it is not fine‽ An arbitrary *mut u8
could be null, dangling, aliased... fn make_zero
is unsound.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True — I typed that far too quickly and without thinking. Regardless, it's not immediately obvious to me that ptr::addr_of_mut!
should be allowed safely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not fully convinced either way, but I fear it would just be confusing to require unsafe
for an operation that can't lead to unsoundness, especially as addr_of!(struct.field) as *mut _
would do the same thing with no unsafe
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
especially as
addr_of!(struct.field) as *mut _
would do the same thing
While you can do that, it's undefined behavior to actually mutate the resulting mut pointer. I've just confirmed this with miri.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's UB under Stacked Borrows, but MIRIFLAGS=-Zmiri-tree-borrows
accepts it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's surprising. I'm not familiar with tree borrows, admittedly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ptr::from_mut(&mut struct).wrapping_offset(offset_of!(Struct, field))
should achieve the same thing as addr_of_mut!
, and I doubt you'd want to make offset_of
unsafe for unsafe fields. And even without offset_of
, you could use ptr::from_mut(&mut struct).wrapping_offset(addr_of!(stuct.Field) - ptr::from_ref(&struct))
. Both of these should be safe in every borrowing model.
Ain't clear this proposal handles auto-traits correctly, even if some use case exists. If you need this, then define your own type which provides this. We've the inner-builder or whatever deref polymorphism pattern, which comes up far more in practice, and can simulate this of desired.
|
How could a field of a struct be unsafe to read?
Within the module it's defined in (as the field is private), it is currently safe to assign any value, despite the fact that it can lead to undefined behavior. Said another way, the current behavior is inherently unsound. Nothing in the RFC so much as hints at
I never claimed that was the case.
Inclusion of one example does not mean that everything not included is forbidden. There is simply no reason to repeat the same thing for every field. Of course
For I have real world code where this is not the case. The
I don't follow. What problems do you see with handling auto traits? The type of the field is unchanged, so there would be no impact on auto traits. |
(@jhpratt I'd love for you to steal something like the text in this post for the RFC)
A huge incentive for it, to me at least, is helping avoid misunderstandings about the model. For example, a 2020 PACMPL paper contains the following statement:
Thus a huge win of this would be to avoid the (from the same paper)
Having the body of Especially combined with other accepted work like https://rust-lang.github.io/rfcs/2316-safe-unsafe-trait-methods.html we could start even doing things like clippy lints for "why is this unsafe when it doesn't do anything unsafe? Should one of the types involved be marked unsafe?" If we don't have to link to tootsie pops as often because changing things that are relied on by As another way to look at this, it's weird that when I'm writing a method on by type with a safety invariant that I can do Tidy should nag about a safety comment for the constructor too, so that I'm not disincentivized to use the other, correctly-marked- |
We do not need In other words, rust does not have unsafe types because you must enforce invariants at module boundaries anyways. Also various discussions in https://github.com/rust-lang/unsafe-code-guidelines clarify this point. A method like Anyways.. If I understand, you want this type:
It's similar to You still enforce invariants by visibility but
Why? All those Anecdotally, this type winds up being much more common:
And In fact, if you wrap the We made this a local type for visibility modifiers, so a language level construct helps here. Also conversely, if you do not require visibility modifiers then simple types like I suppose one might imagine |
EDIT: see below; it looks like the thing I was worried about here is probably impossible for other reasons. I do thing that "safe to read; unsafe to modify" is the 99%+ case, and should certainly be the default, but
Well, the field in Or the So perhaps some nuance for |
I don't think that is correct? The unsafe operation is dereferencing the pointer returned by |
Ah, I guess an access can't actually read an So I think the |
The nomicon describes current behavior; using it as an argument against this RFC is counter to the purpose of the RFC. It's circular reasoning at best.
The invariants are "merely" there for soundness. If the invariant is violated, the result is undefined behavior. That's far more serious than you make it sound.
I have never seen anyone write code like this in practice. The standard library and my own code (in @scottmcm I'll definitely include parts of that into the RFC. Also reading that blog post now — I'd never seen it before. |
This RFC is not better because memory safety also helps when writing unsafe code. The In other words, we always convert regular code invariants into memory safety assurance, but these regular code invariants have exactly the same risks as other regular code, including their own memory safety concerns. This RFC confuses the memory safety consumed in maintaining the regular code invariant with the actually unsafe options the code requires. In the past,
Anyways.. I think this discussion belongs in https://github.com/rust-lang/unsafe-code-guidelines where at least some people think formally about the unsafe code boundary. |
This is your fundamental misunderstanding. Other code in
Frankly, it's thinking like this that led to the creation of Rust. Thread safety can be maintained if everyone is super careful, but we all know how that works out. Likewise with a million other things. Programmers can not be relied upon to do the right thing. We have to force them to do it by leveraging the compiler wherever possible.
I have no idea why you think the discussion belongs there, particularly as you're the one that initiated it here. |
Yes, but this does not make altering What happens if I've unsafe code which relies upon an invariant between the values in a slice, so your unsafe field is a We often have this "catch your tail" phenomenon in formalalisms.
No. There is a formal model from the rustbelt project about how unsafe works, which gets discussed in the unsafe code guidlines repo. We should only expand what falls under unsafe code if the formal model says so. This RFC confuses those really important formal models with mere "read this" markers. In brief, you don't have a formal conception of when unsafe types should be used. It's unlikely one exists. That makes this change a regression towards the C days. |
It actually is. If something can result in undefined behavior, it is unsafe. Where that happens is wholly irrelevant. You're using circular logic regarding the UCG, which is by definition written about Rust as it currently is. You seem to be claiming that library undefined behavior doesn't exist, which is bizarre. I'll leave it at that as there's clearly no convincing you that the mere concept of an unsafe field has merit. I won't be responding further to anything along those lines. |
Could you add another motivating example to the RFC? It seems like the |
@tgross35 I linked this earlier in the thread. Is that sufficient for you? It demonstrates that field-level safety is appropriate, as some of the fields are perfectly safe to set as they have no interaction with any other field. Yet at the same time, all fields are one logical unit that should not be split into separate structs. |
Thank you for the link, I meant specifically to add an example to the RFC document (which perhaps you planned to do anyway). Even with that example, it does seem to me that it would be more correct to nest
Niche optimization was mentioned as well as a reason for not wanting to split off types, but I think this is more applicable to all separations of logic, and not just those with safety concerns. I don't wish to derail this conversation, but the thought has crossed my mind before about how Rust could potentially use a I think that investigating something like that may be more widely useful than using niche optimization as a justification for why a single flat struct is the correct solution. (edit: I brought this up for some discussion on Zulip https://rust-lang.zulipchat.com/#narrow/stream/213817-t-lang/topic/.60.23.5Bflatten.5D.60.20struct.20field.20attribute) |
text/0000-unsafe-fields.md
Outdated
breaking existing code. Over an edition boundary, safe reads of `unsafe` fields could be permitted | ||
by rewriting existing `unsafe` fields to wrap their field type in a tooling-aware `Unsafe` wrapper | ||
type. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not convinced that changing the type of the field over an edition is viable. (However, I agree that safe-to-&
unsafe fields is something that can be left for later.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be sound, but not seamless. Since the Unsafe
wrapper is presumably #[repr(transparent)]
, its addition doesn't have opsem implications. Crate authors with public unsafe fields affected by this could not migrate editions without releasing a SemVer breaking change, since changing a public field's type is a SemVer breaking change.
I raise this all as a future possibility only for completeness — I don't think we'd walk through this door likely or lightly. I remain confident that the design proposed by this RFC offers the best tradeoffs.
- `unsafe(init,&mut,&,read)` (everything is unsafe) | ||
- `unsafe(init,&mut,&)` (everything except reading unsafe) | ||
- `unsafe(init,&mut)` (everything except reading and `&`-referencing unsafe) | ||
- etc. | ||
|
||
Besides the unclear semantics of an unparameterized `unsafe()`, this design has the disadvantage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way I conceptualized it in my proposal at #3458 (comment): the argument to unsafe
denotes the weakest reference type that is unsafe
to obtain. So unsafe()
makes &
and &mut
references unsafe, while unsafe(mut)
only makes &mut
unsafe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If Rust ever gets partial borrows, the syntax for that could be adopted to extend this feature as well.
…ecker. Also discuss future possible interaction with partial borrows.
This alternative also inherits some of the disadvantages of [`Unsafe` wrapper | ||
types](#unsafe-wrapper-type); namely that the safety proofs needed to operate on an `Unsafe` wrapper | ||
value depend on the dataflow of the program; the wrapper value must be traced to its originating | ||
field so that field's safety documentation may be examined. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, the same is true of this RFC’s design. With this RFC, the unsafe
is required at the site of the copy/move, and to determine whether the safety condition is met, you must trace dataflow forward to where the value is later used. With Unsafe
wrappers, the unsafe
is required at the use-site, and dataflow must be traced back to the copy/move site. Both solutions require tracing dataflow, just in different directions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But isn't that doubly true for Unsafe
? Unsafe
's accessors also require unsafe, forcing you to further trace the dataflow forward to check that uses don't violate the field invariant (which you discovered by tracing the dataflow backwards).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, but you will probably not use the accessor until the last possible moment. So you are unlikely to have to trace forward very far.
According to @jswrenn this RFC has reached a fixed point. There are two leading alternatives. This RFC picks the simpler one of making exactly one kind of unsafe field, where all accesses are Nominating for lang team discussion to see if we can get to a quick consensus on this RFC. I'd like to hear from @scottmcm as the lang team champion before moving forward. @rustbot label I-lang-nominated |
block and attendant `SAFETY` comment is required. In most cases, this is a one-time chore: the | ||
maintainer can define a *safe* accessor (i.e., | ||
[`Vec::len`](https://doc.rust-lang.org/std/vec/struct.Vec.html#method.len)) that encapsulates this | ||
proof. However, in cases where multiple, partial field borrows are required, such an accessor cannot | ||
be invoked. [Future language extensions that permit partial borrows may resolve this | ||
drawback.](#partial-borrows). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even with partial borrows, it may not be possible to write a safe helper method for moving out of a field (while leaving the rest of the struct intact).
I would definitely love to also have the future work of defining "unsafe to write but safe to read" fields. But I think this initial proposal as written is great and provides a good solution for many cases. @rfcbot merge |
Team member @joshtriplett has proposed to merge this. The next step is review by the rest of the tagged team members: No concerns currently listed. Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up! cc @rust-lang/lang-advisors: FCP proposed for lang, please feel free to register concerns. |
The [`StructField` syntax][struct syntax], used for the named fields of structs, enums, and unions, | ||
shall be updated to accommodate an optional `unsafe` keyword just before the field `IDENTIFIER`: | ||
|
||
```diff | ||
StructField : | ||
OuterAttribute* | ||
Visibility? | ||
+ unsafe? | ||
IDENTIFIER : Type | ||
``` | ||
|
||
[struct syntax]: https://doc.rust-lang.org/stable/reference/items/structs.html#structs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about TupleField
? (oversight? future extension? by-design unsupported because of ambiguity with struct F(unsafe fn());
?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By-design unsupported because of ambiguity. Might be possible to disambiguate with parens, but that's future work.
I've not been following this discussion too closely. While I understand that there is the desire to have "unsafe to use" as the default, I would like to ensure that there is, at a minimum, syntax that the lang team finds acceptable for "safe to access" fields as I'd originally proposed. This section touches on it, but the only thing I see that even seems plausible is |
@jhpratt I'm taking the today off for my birthday, but I'll draft a Future Possibility tomorrow that sketches out the steps |
For clarity, I'm not disputing the need for "unsafe to use" by default — I understand the reasoning. I was solely asking for something like (happy birthday 🍰) |
@joshtriplett, @jhpratt I've now called out |
Co-authored by @jswrenn.
Fixes #381
Rendered
Tracking: