Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tweak RFC 246 with lessons learned from implementing #362

Merged
merged 4 commits into from
Oct 16, 2014

Conversation

alexcrichton
Copy link
Member

A few surprises were uncovered while working on RFC PR #246 (formerly known as RFC 69) with reference to globals referencing globals, and some clarifications about globals in patterns was added.

@arcto
Copy link

arcto commented Oct 7, 2014

I hope this is not too off-topic or too bikesheddy, but could we also address the naming conventions of the two different categories? I proposed on Discuss that const identifiers be written in CamelCase, just like enum variants, to convey that they too are values without a significant address.

From the RFC:

Constants are intended to behave exactly like a nullary enum variant

@alexcrichton
Copy link
Member Author

Currently we've been putting conventions like naming through RFCs as well (see some of @aturon's recent RFCs), and I imagine that a convention such as that would also need to go through the RFC process. It does seem pretty separate from this itself as this has more to do with the technical nuances of const rather than conventions around using it.

@petrochenkov
Copy link
Contributor

@alexcrichton
const arrays will be rvalues too? It could lead to some unfortunate gotchas.

const MY_TABLE: [(char, char), ..128] = [('a', 'A'), ('b', 'B'), ('c', 'C'), /*...*/];

fn do_something<T>(arg: &[T]) {}
do_someething(&MY_TABLE); // Oops, big temporary array is created
let b = MY_TABLE[10].1 == 'D'; // This copy is even more sneaky

And what about a common case - static strings?
const MY_LITERAL: &'static str = "Hello world!";
It is reasonable for MY_LITERAL itself to be a const rvalue, but is the storage referred to by MY_LITERAL const or static? It seems like it should be static (although it's not mentioned directly in the RFC), but it would prohibit a common technique of merging identical string literals.

As a possible extension, it is perfectly reasonable for constants to have generic parameters.
Note that this makes no sense for a static variable, which represents a memory location and hence must have a concrete type.

Interestingly, generic statics have a precedent in C++.

// struct template with static data member
template<typename T> S {
    static int i;
};
// Or even variable template in C++14
template<typename T>
int S_i;

All instantiations of S<T>::i or S_i<T> for different T's will be separate variables with unique addresses.

@alexcrichton
Copy link
Member Author

That is correct, large constant arrays likely don't want to actually be constants, they want to be a static instead (because they care about the memory location they're assigned to).

Note, however, that this is not a problem for slices because the value of a slice is actually just two pointers. In the case of a const value of type &'static str, the "inlined value" is actually just two words, where one of the words is a pointer into rodata.

@mahkoh
Copy link
Contributor

mahkoh commented Oct 8, 2014

Given that the rust PR says that the const implementation behaves like #define, what's the point behind keeping the type after the name?

## Patterns

Today, a `static` is allowed to be used in pattern matching. With the
introduction of `const`, however, a `static` will be forbidden from appearning
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"appearing"

@alexcrichton
Copy link
Member Author

Added a note about disallowing interior borrows of statics as learned from @pcwalton's review.

@alexcrichton alexcrichton self-assigned this Oct 9, 2014
@Zoxc
Copy link

Zoxc commented Oct 15, 2014

@alexcrichton Can we avoid inlining const without UnsafeCell and leave that up to LLVM so large constant won't be penalized?

Also can the lifetime of const be 'static when the type doesn't contain UnsafeCell? So you need not needlessly create constant references.

@thestinger
Copy link

A strong -1 to going further down this road from me. Performance / size concerns / compile-time have been completely ignored and the justification for removing unnamed_addr from static has not been made. To be clear, it would not have interfered with mutability. Duplicating even relatively small constants across the code is a significant compile-time and code size regression. It will impact performance when this missteps result in more bloat in the cache.

Inlining the definitions across crate boundaries does not interfere with mutability or even significant addresses if available_externally is used, so the removal of that was not justified by facts either. Maybe it makes sense to remove it by default, but the case to do it was not made.

@thestinger
Copy link

C++ has mutable globals and constants with the ability to have inner mutability in those constants. The need to treat those constants as mutable global variables internally is an implementation detail. There is no reason that those implementation details need to include lack of unnamed_addr and inability to do inlining across crates. It does not interfere with mutability. An insignificant address does not imply the ability for the compiler to simply merge globals without checking. Inlining a definition does not break mutability and it does not even give it an insignificant address if it is implemented correctly. In fact, LLVM will only merge globals after it has proven they are constant (unless it is told up-front).

@thestinger
Copy link

I also think the fact that const is an rvalue is another strange mistake. It could have just been defined sanely by taking the old static and forbidding extern { const ... } along with forbidding borrowing a const containing UnsafeCell. Hurting code generation in order to obtain inferior semantics makes little sense...

@pcwalton
Copy link
Contributor

We could implement const by actually reserving space in rodata if we wanted to, no?

I don't see how the "expand in place" behavior of const is an implementation detail any more than the opposite is for static. Being able to get rid of the #[address_insignificant] wart alone makes the whole thing worth it.

@thestinger
Copy link

It can work exactly like static with the restriction that it can't reference an external symbol. Whether a static or const is actually marked as constant in the LLVM IR is an implementation detail. Internal mutability doesn't require the restrictions and code generation flaws that are being imposed on everyone.

Being able to get rid of the #[address_insignificant] wart alone makes the whole thing worth it.

Making this the default / only option was already accepted earlier but not yet implemented. Having an insignificant address is in no way incompatible with mutable data. Neither is inlining of definitions thanks to the semantics of available_externally.

@pcwalton
Copy link
Contributor

Making #[address_insignificant] the only option doesn't seem tenable to me.

@thestinger
Copy link

Why? It's already the only option for functions and there isn't a use case for significant addresses. Defining a type and using type_id already works fine as a way of getting a unique value.

@thestinger
Copy link

I actually don't think there's a point of having const at all. The compiler can be taught to forbid constant expressions referring extern statics and everything will work fine. It add complexity, performance issues and semantic restrictions all due to a lack of understanding of the LLVM global semantics AFAICT...

@pcwalton
Copy link
Contributor

I don't see how you solve the fundamental problem of "sometimes I want the actual value of my constant to be part of my interface, but sometimes I want the actual value of my constant to be part of my implementation" in, for example, a dynamic linking scenario without a distinction between const and static.

That is, what if I want to change my constant without breaking downstream crates? (Assume we had a stable ABI here)

@thestinger
Copy link

static mut: Anything = expr -> LLVM global with unnamed_addr
static: ContainsUnsafeCell = expr -> LLVM global with unnamed_addr
static: NoUnsafeCell = expr -> LLVM constant with unnamed_addr

The compiler is free to inline the definitions across crates for all globals, including mutable ones, as long as they are marked available_externally. It's pointless to do it for ones that are constant since it won't be able to make assumptions about the data, so it could be restricted to static without internal UnsafeCell.

Guaranteeing that the static data is usable in constant expressions requires forbidding extern statics from being used along with addresses that are variable at runtime. For example, taking the address of a static can be be permitted in a constant expression but pointer -> integer casts can not be permitted in general.

@thestinger
Copy link

@pcwalton: The same problem exists for functions (manual cross-crate inlining is a wart - the compiler can and should do it automatically without much effort) and the solution is already #[inline(never)]. The existence of const doesn't change anything because it's not a viable replacement for static. Duplicating the value everywhere without a reusable address or the ability to get &'static T makes it a crippled feature.

@thestinger
Copy link

(and there's no reason for unnamed_addr to be opt-out/opt-in or only applied to true constants)

@thestinger
Copy link

Going down this road actually means we need to keep a way of marking unnamed addresses forever, because const is utterly incapable of replacing static. There's no sane reason to duplicate large objects / arrays everywhere instead of just defining one and reusing it. However, without unnamed_addr on static by default we need a way to opt-in to it for code that cares about bloat.

I don't think there's actually a valid reason to want to opt-out if it was the default so that would eliminate the complexity. Defining types makes more sense as a way to create a unique id since it doesn't imply overhead by default. You only pay for it when you explicitly make use of type_id to implement logic based on unique ids, which is going to be incredibly rare.

@pcwalton
Copy link
Contributor

Manual cross-crate inlining is not a wart and is part of the design of Rust. The fact that the guts of a non-generic function can be changed without breaking downstream callers by default (i.e. without opt-in) is an important feature of C and C++, and Rust needs to support that. Separate compilation is important for many use cases; if you require #[inline(never)] for it then you've essentially killed it, because few people are going to type that.

The const vs. static dichotomy is the natural extension of this idea to globals. Having to use inlining attributes to get the distinction is too confusing (as this RFC states), so a more fundamental lexical distinction was chosen.

To be clear, I would have no problem with supporting &'static on constants if there is indeed a way to do that properly. However, I think that saying that const vs. static is an unnecessary distinction is going too far, because it blurs the line between interface and implementation too much.

@pnkfelix pnkfelix changed the title Tweak RFC 69 with lessons learned from implementing Tweak RFC 246 with lessons learned from implementing Oct 15, 2014
@nikomatsakis nikomatsakis merged commit 14bd7da into rust-lang:master Oct 16, 2014
@thestinger
Copy link

The fact that the guts of a non-generic function can be changed without breaking downstream callers by default (i.e. without opt-in) is an important feature of C and C++, and Rust needs to support that.

Rust doesn't have a stable ABI for many reasons including features like type_id and Any. Those those issues were rejected as backwards compatibility issues for 1.0 so it's unlikely that it will offer the ability to expose a stable ABI.

Even with all of those issues fixed, altering a generic function or the private representation of a public type will break the ABI. Exposing a stable ABI requires putting a lot of care into the design of every type and function. For example, types will either need to stick with a frozen representation (possibly with unused padding fields for future expansion) or use the PIMPL idiom via Box<T>.

Very few crates are going to be designed to have a stable ABI, and it doesn't make sense to force manual inline annotations in crates without one because it can be done by the compiler with a bit of effort put into it. It would be simple enough to offer the ability to opt-out of automatic cross-crate inlining.

To be clear, I would have no problem with supporting &'static on constants if there is indeed a way to do that properly. However, I think that saying that const vs. static is an unnecessary distinction is going too far, because it blurs the line between interface and implementation too much.

There is no reason for const to be an rvalue that's duplicated everywhere. It could be an lvalue with the borrow checker forbidding borrows of a const containing UnsafeCell. This would allow defining constants of any size without risk of duplication. Inlining would occur across crate boundaries via available_externally so only one symbol would ever be generated. It would also allow borrowing normal constants without inner mutability as &'static references which is required for many use cases previously covered by static.

@thestinger
Copy link

As usual, an RFC by a Mozilla developer is accepted without addressing any of the concerns. The RFC continues to make many false claims. I don't see how this process has any legitimacy at all.

@pcwalton
Copy link
Contributor

type_id and Any leading to an unstable ABI is an implementation detail that we can change later. The rest of your comments apply equally well to C and C++ and having a stable ABI "by default" leads to a lot of business value in the real world (Qt, WebKit, etc.)

@thestinger
Copy link

It being an rvalue instead of an lvalue is not an implementation detail. There is no good reason for crippling it like that. The decision to make static variable have significant addresses lacked justification. Despite it being brought up many times, misunderstanding about unnamed_addr and inlining of definitions via available_externally made their way into the RFC. Neither of these is incompatible with mutable data.

@pcwalton
Copy link
Contributor

The fact that const is currently duplicated is an implementation detail and can be improved in the compiler without changing the language.

I addressed your concerns in my previous message and implying malice is inappropriate. This is technical disagreement.

@thestinger
Copy link

It's not technical disagreement. There has been no justification for it being an rvalue instead of an lvalue. There has been no justification for making statics have significant addresses. As I have stated countless times without being listened to, insignificant addresses are entirely orthogonal to mutability and there is no reason to avoid unnamed_addr due to mutability. You're continuing to cherry pick a few words from my comments without addressing my points.

@thestinger
Copy link

Since it doesn't allow taking an &'static reference (due to being an rvalue) it will never be a full replacement for the old static even with all kinds of workarounds for the poor design in the implementation. This was also coupled with crippling the ability to optimize code using static by explicitly defining it as having a significant address. The impact of unnamed_addr is entirely compatible with mutability and will not result in merging solely based on the value of the initializer.

@pcwalton
Copy link
Contributor

I agree that I do not see a need to make constants rvalues (and that was in fact something I brought up at the meeting). However, the decision was made because this design is a purely conservative improvement to the current system. Making constants lvalues is an improvement that can be made later.

The ABI stability, and more importantly, semantic difference between interface and implementation, issue, however, is something that was considered important. Evidently you do not consider that important. That is a simple technical disagreement, and the core team decided this way on the issue.

@thestinger
Copy link

The ABI stability, and more importantly, semantic difference between interface and implementation, issue, however, is something that was considered important. Evidently you do not consider that important. That is a simple technical disagreement, and the core team decided this way on the issue.

You're going out of your way to misrepresent my statements my statements. Here is what I just wrote:

Rust doesn't have a stable ABI for many reasons including features like type_id and Any. Those those issues were rejected as backwards compatibility issues for 1.0 so it's unlikely that it will offer the ability to expose a stable ABI.

Even with all of those issues fixed, altering a generic function or the private representation of a public type will break the ABI. Exposing a stable ABI requires putting a lot of care into the design of every type and function. For example, types will either need to stick with a frozen representation (possibly with unused padding fields for future expansion) or use the PIMPL idiom via Box<T>.

Very few crates are going to be designed to have a stable ABI, and it doesn't make sense to force manual inline annotations in crates without one because it can be done by the compiler with a bit of effort put into it. It would be simple enough to offer the ability to opt-out of automatic cross-crate inlining.

Since you rejected the bugs I filed about ABI compatibility as relevant to backwards compatibility, I find your claim that it's me who doesn't care about it pretty laughable.

@thestinger
Copy link

Making constants lvalues is an improvement that can be made later.

It would be a backwards incompatible change. It's possible to move out of an rvalue but not out of an lvalue. It would make a lot more sense to define an incredibly basic const fn system to serve the role of generating non-copyable values. It's not going to possible to fix these mistakes in the future without breaking compatibility.

However, the decision was made because this design is a purely conservative improvement to the current system.

The decision to make static addresses significant was a regression to the current system without a rationale. The significance of the address is entirely orthogonal to mutability. Rust doesn't have significant addresses for globals even without unnamed_addr due to zero size types, and mutable globals would work just fine if they did have significant addresses.

Using types and type_id to obtain unique values has zero overhead, unlike making a bunch of globals and using their address as a unique identifier.

@pcwalton
Copy link
Contributor

I'm sorry, but I don't understand the point you're trying to make (and the hostility is making it even more difficult). My opinion that ABI stability is a worthwhile post-1.0 goal. That implies that we should not make decisions that make that difficult or impossible. That said, the fact that the implementation of certain APIs that are not going to be stable at 1.0 are currently precluding ABI stability does not prevent us from reaching that goal. Inlining all functions across crates and allowing the definitions of constants to be inlined would, however.

On a meta-point, I don't think this RFC is the best place to discuss whether ABI stability should be important to Rust. I think that deserves a more holistic discussion. Much of the disagreement seems to stem from a difference of opinion regarding whether we should care about ABI stability. (I should note, however, that I find the argument about interface versus implementation to be compelling--for statics, the precise value is not considered part of a crate's interface, while for constants, it is; this gives your library users an important clue as to whether they should be relying on the value to change or not).

@pcwalton
Copy link
Contributor

That's a good point regarding backwards compatibility of making constants lvalues, but we could just promote constants to rvalues if moving would produce an error. (Moving out of a constant is rare anyway because they're usually implicitly copyable, as they're static.)

The rationale for making addresses significant, as I recall, was to eliminate the hazard when people do use statics as unique identifiers (as they will anyway regardless of what we say). I personally don't really care too much one way or another.

@thestinger
Copy link

Statics essentially have insignificant addresses even without unnamed_addr because zero-size types always have a meaningless address and Rust is allowed to turn any type into a zero-size via optimizations because struct layout is undefined. It's fully permitted to remove unused fields.

@pcwalton
Copy link
Contributor

Presumably Rust is not allowed to do that for statics. (That should be clarified in the RFC.)

@thestinger
Copy link

Losing no overhead zero-size statics would be bad because zero-size types are a useful abstraction, as is defining a static with one. It could have very specific language forbidding the removal of unused fields from a struct used in a static but it would essentially need to be a recursive promise (the types of the fields matter too) which would remove a lot of the intended power offered by implementation-defined representations.

@pcwalton
Copy link
Contributor

Do you have a use case for a zero-sized static? Like I said I don't have strong feelings either way but I am curious what the use case is that couldn't be better done another way. I suppose you could have a struct in a static that is never used (so all its fields get optimized out) but then you might as well remove the static as well! (Unless you're only taking the static's address--but that is precisely where having significant addresses helps.)

@thestinger
Copy link

Do you have a use case for a zero-sized static?

They're useful for implementations of traits without state. It allows obtaining an &'static T where T takes no space but satisfies a generic type bound.

Unless you're only taking the static's address--but that is precisely where having significant addresses helps.

Defining a type and type_id would accomplish the same thing without wasted space. If there was a lint for address leaks (for ASLR) then *T as int would cause a warning anyway. It's not a very good way to get a unique id.

@pcwalton
Copy link
Contributor

&'static on a constant should definitely work (and I know I brought that up at the meeting re. this issue), just like &[] is 'static. So there's no need to use a static for that.

@thestinger
Copy link

If you can obtain a &'static reference to const with no UnsafeCell inside (i.e. it is an lvalue) then I think it would be perfect for what it wants to accomplish (other than bugs like rust-lang/rust#18294) but then I don't see why you'd ever want to put constant data in a static so I would expect a warning. The existence of static mut is kind of weird since you get the same semantics with UnsafeCell in a static, although it would be nice if #[thread_local] static mut: ... = ...; acted like a Cell and allowed safe read/write but only unsafe borrows.

@arielb1
Copy link
Contributor

arielb1 commented Oct 25, 2014

Don't you need to be able to move out of constants for mutex initializers to work?

@Centril Centril added A-typesystem Type system related proposals & ideas A-patterns Pattern matching related proposals & ideas A-machine Proposals relating to Rust's abstract machine. A-const Proposals relating to const items A-static Proposals relating to static items. labels Nov 23, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-const Proposals relating to const items A-machine Proposals relating to Rust's abstract machine. A-patterns Pattern matching related proposals & ideas A-static Proposals relating to static items. A-typesystem Type system related proposals & ideas
Projects
None yet
Development

Successfully merging this pull request may close these issues.