-
Notifications
You must be signed in to change notification settings - Fork 537
&str and &[u8] have the same layout #1848
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
@rustbot label: +I-lang-nominated +T-lang |
@rustbot label: +I-lang-easy-decision |
Unknown labels: I-lang-easy-decision |
FTR, the standard library has every right to make assumptions about the implementation of the language beyond what the language does guarantees, because it is intrinsically tied to rustc. Not necesssarily a point against making a decision here, but I don't think it's a strong point in favour of stabilizing the equivalence either. |
Agreed -- I actually am going to reword this to make it clear that I mean this isn't a change for Rustc, only a codification of existing decisions |
The layout for transmute doesn't matter. I guess the safety comment was about
That's only for safety. For correctness, we also need valid values of So I'm not sure guaranteeing that |
@ia0 What would you suggest for how the documented guarantees should be strengthened to match what the PR author obviously wants? |
I'm going to assume "what the PR author obviously wants" is "guarantees that transmuting between The problem is that the Reference doesn't yet have this concept. Ralf asked for such concept in #1752 (comment):
This was lost in the middle of a PR review so I guess it didn't get the attention it could have. But I'm assuming this request has gone through other channels given how critical it is for users writing unsafe code and relying solely on the Reference. The only type AFAICT that has a representation relation defined at this time is Today, one can somehow get close to the domain of the representation relation (i.e. which byte sequences are valid, but not which value they represent, which matters for unsafe code where some amount of correctness matters) by combining information about the type layout (its size, relative offset of fields, and discriminant representation, but not its alignment) and the bit validity. However the bit validity is currently often underspecified. For example I didn't find the bit validity of In my opinion, it would be better to wait until we have a notion of representation relation, such that all such guarantees for unsafe users can be specified in a uniform way. In the meantime, unsafe users should refer to the Unsafe Code Guidelines and other documentation like MiniRust, but this is somehow in opposition with rust-lang/unsafe-code-guidelines#566 (comment):
So maybe adding the representation relation to the Reference should be prioritized? |
I think, though that's a valid goal, your point is really to make a more rigorous definition of "layout", particularly using the term "representation relation" instead with some well defined meaning. A change like that, as you note, would require some large work in the spec (at the minimum, looking at all the usages of "layout") and so I think should come in as a separate PR/RFC. Particularly, I'm comfortable saying "whatever 'layout' means, &str and &[u8] have the same one." I can amend the PR to make that clearer. How does that sound? |
That would be an editorial question. I'm not saying that. I'm saying a new concept of "representation relation" (in addition to "layout") should be added. How that's implemented is up to editors of the Reference. There are at least 3 options:
But how would this solve the transmute problem? What transmute asks is that the sequence of bytes being transmuted is valid at both the source and destination type. The layout of those types is not what you need to satisfy this requirement. You need the validity invariant (aka the domain of the representation relation). The layout is neither sufficient (it needs bit validity) nor necessary (it talks about alignment). To be clear, I'm not against this change (it seems reasonable to me), I'm just arguing that we should not believe that it will solve the transmute problem. And thus we should document the motivation for this change (assuming changes to the Reference need to be motivated). |
When I say "layout" (and I suspect when the std says the same), I am including field offsets in that -- that is part of the validity of the transmute. I should be more careful in the text about that, but since "layout" is still used vaguely in the reference, i think it would be best to wait on that. So the transmute I referenced is saying "the offsets where the pointer and the length is stored are the same in &[u&] and &str". The std can currently say that is true as a point of fact and this RFC is to make that true as a point of reference. I might not be following you perfectly though -- you seem much more versed in programming language theory -- so let me know if I'm missing your point! |
Yes, layout is "size + align + field offset + discriminant" from layout.intro. I think the problem is on what follows.
Indeed,
This is not implied by In other words, while the sentence " So you could see this PR as a step towards proving the transmute with the Reference, but it's not a step perfectly aligned with that goal, because it also guarantees something about alignment which is not needed (and currently not guaranteed although true in practice now and most probably always). |
@ia0 Sorry, I do not think you are contributing anything further here. It seems to be an obviously preexisting problem. Please open a PR against the reference to address the concerns you have. |
Given the existing text that
then I think any concerns I'd have about what "layout" means exactly would also apply there, so overall I think this guarantee makes sense. That said, doing it just for I'll also cc rust-lang/rfcs#3775, which I think if it lands will necessarily make this guarantee as well. |
Exactly -- that mirrors what @kpreid referenced above too. Because all pointers have the same layout:
I think we don't need, therefore, to restate that in this section. I think, as @kpreid said, it would be best if we had a term like "all primitive pointers to However, since we don't currently have any rhetoric for "primitive pointer", I think that can be saved for another RFC which could then be backported to this subsection. In the meantime, I wouldn't mind adding link to https://doc.rust-lang.org/stable/reference/type-layout.html#r-layout.pointer.intro so readers understand that
Yes, I think if we're ready to pull the trigger on that RFC, then this PR is naturally included with that. However, assuming accepting that RFC is less immenent, accepting this PR first will simplify that RFC -- it will only have to talk directly about |
Currently,
str
and[u8]
are promised to have the same layout, but&str
and&[u8]
are not promised to have the same layout. The std currently assumes that they are promised to have the same layout (https://doc.rust-lang.org/src/core/str/converts.rs.html#172), so this change would have no impact beyond codifying what is already in practice. This PR defines&str
and&[u8]
to have the same layout, though what that layout is continues to be unspecified.There are some further steps here that I didn't take:
str
. I have addedstr
in several places in the reference where it otherwise refered to slices, but likely the definition of a slice should also simply includestr
. This is a bigger conversation and frankly unimportant if...str
into a libcore struct (redux) rust#107939 ever getts stabilized. In that case, all of this doesn't matter andstr
would be removed from the reference. This seems to me to be obviously the better choice.In any case, this PR represents a fairly incrementalist approach.
Thanks for the insight of those on the Zulip thread here