-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Explain how Vec::with_capacity is faithful #99790
base: master
Are you sure you want to change the base?
Explain how Vec::with_capacity is faithful #99790
Conversation
There are concerns that the doc changes in rust-lang/rust@95dc353 are breaking changes in the promised API. In this commit, I explain in more detail the exact promise that Vec::with_capacity is really making: Vec is trying to act unsurprising by relaying information faithfully to the allocator, it is not committing to internal details of Vec itself beyond that. As it happens, we don't get useful capacity information from allocators, but once upon a time Rust did capacity recalculation from available data, and we should reserve the right to do so again if it seems profitable and correct. This path avoids adding a duplicate `with_capacity_exact` to Vec's already-formidable API surface.
Hey! It looks like you've submitted a new PR for the library teams! If this PR contains changes to any Examples of
|
r? @thomcc (rust-highfive has picked a reviewer for you, use r? to override) |
I'm in favor of this, for sure. I'm also not certain if this is a contractual change, or a clarification of the existing contract. I'd lean toward the former, but it's close enough that I'll reassign to someone on libs-api (e.g. someone who can start an FCP about it), just to be sure. r? @m-ou-se |
The current contract is somewhat inconsistent because the documentation (as of Rust 1.63) currently says here:
But in
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
vec![x; n]
andVec::with_capacity(n)
produce aVec
that allocatesn
capacity.
This should probably read "at least n
capacity" if this is how Vec
behaves. Otherwise it could be left as is. But not sure about the grammar here.
Vec
has preferred either answer at different times and may change again.
I don't think it's necessary to explain how things changed in past, and I would thus remove that sentence above. The previous sentence ("In that case, capacity
may return either the requested capacity or actual allocated size.") is already clear enough in my opinion.
However,
Vec::with_capacity(5)
will not deliberately "round up" toVec::with_capacity(8)
for any non-zero-sized type, to respect programmer intent.
I'm not as fond of giving an example here (and if an example is given, it should be phrased with "for example"). Maybe better phrase it like the following:
"However, Vec::with_capacity(n)
will not deliberately "round up" in anticipation of the Vec
growing beyond n
elements. It thus behaves like Vec::new
followed by Vec::reserve_exact
, but may be faster."
I didn't check how Vec
actually behaves. This still needs to be reviewed. I just wanted to comment on the wording here.
@JanBeh noticed that too and opened an issue which this PR would close: #101316 |
I have the power to close issues. @dpc: An allocator for a system may deliberately choose to match, byte for byte, the requested allocation size, due to being run on a platform that has e.g. different RAM constraints. It is my understanding that the glibc malloc implementation does this for larger objects (not for tiny ~16 byte allocations, but for kilobytes, where the slight overhead of tracking the size is more beneficial than overusing memory), and is widely considered to be of reasonable performance, especially across the variety of different hardware glibc is deployed on. So the design of things like jemalloc is not a universal good, and I believe it is inappropriate for Rust to adopt in a default container, in an explicit API. Designing Vec in such a way that the API naturally works with either being bound against jemalloc-like or glibc-like allocators seems a far superior choice. |
Hi, I'm the original author of #96173. Since this accidentally apparently caused a breaking change I feel I should probably start by apologising for the extra confusion I caused. I am sorry. With the already suggested change of "with exactly the requested capacity" -> "with capacity at least I'm still a little unclear on the libs team's consensus (if there is one) on the intended behaviour here (having read the various issues linked above, as well as zulip). If this is the wrong place to discuss that, please do let me know where that discussion can be more helpful. Generally (this discussion for non-ZSTs for the moment), the user can request capacity
My main motivating example for moving to 2 is mentioned in #95614, where you need to break a vec into raw parts (e.g. for FFI) and then re-construct. The docs for Assuming this is all right and there is consensus on it, then this behaviour is already incorrect, is it not? The comment states that allocators "currently" return the requested size, but the |
I do not believe that suggestion applies, although I am not entirely sure what line people are suggesting it for, given people are talking inexactly about substrings instead of highlighting them in the UI. What this attempts to explain is that Vec acts on the programmer's intention, and that sometimes this constitutes a surprising reality, i.e. that Vec, the programmatic entity, allocates with that capacity, and then the Allocator decides the capacity. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies, I now understand your intention. I've made some suggested changes which I believe makes that intention more explicit.
/// `vec![x; n]` and [`Vec::with_capacity(n)`] produce a `Vec` that allocates `n` capacity. | ||
/// `vec![a, b, c, d, e]` produces a `Vec` which allocates once for all items (in this case 5). | ||
/// An allocator may return an allocation with a size larger than the requested capacity. | ||
/// In that case, [`capacity`] may return either the requested capacity or actual allocated size. | ||
/// `Vec` has preferred either answer at different times and may change again. | ||
/// However, `Vec::with_capacity(5)` will not deliberately "round up" to `Vec::with_capacity(8)` | ||
/// for any non-zero-sized type, to respect programmer intent. | ||
/// | ||
/// Excess capacity an allocator has given `Vec` is still discarded by [`shrink_to_fit`]. | ||
/// If <code>[len] == [capacity]</code>, then a `Vec<T>` can be converted | ||
/// to and from a [`Box<[T]>`][owned slice] without reallocating or moving the elements. | ||
/// `Vec` exploits this fact as much as reasonable when implementing common conversions | ||
/// such as [`into_boxed_slice`]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here are some suggestions for re-wording based on your stated intent, which I hope read a little clearer with a clearer distinction between "requested", "allocated", and "reported" capacities (none of which are guaranteed to equal any other).
Para 1: These are the ways of constructing with a specific capacity, they provide this guarantee on the allocation request.
Para 2: Allocator may return more space, and Vec is not guaranteed to report capacity as exactly the requested or the allocated capacity
/// `vec![x; n]` and [`Vec::with_capacity(n)`] produce a `Vec` that allocates `n` capacity. | |
/// `vec![a, b, c, d, e]` produces a `Vec` which allocates once for all items (in this case 5). | |
/// An allocator may return an allocation with a size larger than the requested capacity. | |
/// In that case, [`capacity`] may return either the requested capacity or actual allocated size. | |
/// `Vec` has preferred either answer at different times and may change again. | |
/// However, `Vec::with_capacity(5)` will not deliberately "round up" to `Vec::with_capacity(8)` | |
/// for any non-zero-sized type, to respect programmer intent. | |
/// | |
/// Excess capacity an allocator has given `Vec` is still discarded by [`shrink_to_fit`]. | |
/// If <code>[len] == [capacity]</code>, then a `Vec<T>` can be converted | |
/// to and from a [`Box<[T]>`][owned slice] without reallocating or moving the elements. | |
/// `Vec` exploits this fact as much as reasonable when implementing common conversions | |
/// such as [`into_boxed_slice`]. | |
/// `vec![x; n]` and [`Vec::with_capacity(n)`] produce a `Vec` that allocates `n` capacity; | |
/// that is, they request a capacity for `n` elements from the allocator. Similarly, | |
/// `vec![a, b, c, d, e]` requests an allocation to cover the number of given elements (in this case 5). | |
/// `Vec` is guaranteed request exactly these capacities and not "round up" the allocation request | |
/// to speculatively avoid potential future allocations in these cases, to respect programmer | |
/// intent. | |
/// | |
/// Any allocator may return an allocation with a size larger than the requested capacity, so | |
/// the allocated capacity may exceed the requested capacity. The reported capacity, as returned | |
/// by [`capacity`], is guaranteed to be at least the requested capacity and not more than the | |
/// allocated capacity, but is not guaranteed to be either. So if the programmer requests a | |
/// capacity of `n` the the allocator will be asked for `n` but may allocate space for `m >= n`, and | |
/// [`capacity`] may therefore also return `c >= n` (but `c <= m` is guaranteed). | |
/// | |
/// Excess capacity an allocator has given `Vec` is still discarded by [`shrink_to_fit`]. | |
/// If <code>[len] == [capacity]</code>, then a `Vec<T>` can be converted | |
/// to and from a [`Box<[T]>`][owned slice] without reallocating or moving the elements. | |
/// `Vec` exploits this fact as much as reasonable when implementing common conversions | |
/// such as [`into_boxed_slice`]. |
☔ The latest upstream changes (presumably #120121) made this pull request unmergeable. Please resolve the merge conflicts. |
@workingjubilee if you can resolve the conflicts we can push this forward. Thanks |
There are concerns that the doc changes in 95dc353
are breaking changes in the promised API. In this commit, I explain in more detail
the exact promise that Vec::with_capacity is really making:
Vec is trying to act unsurprising by relaying information faithfully to the allocator,
it is not committing to internal details of Vec itself beyond that.
As it happens, we don't get useful capacity information from allocators,
but once upon a time Rust did capacity recalculation from available data,
and we should reserve the right to do so again if it seems profitable and correct.
This path avoids adding a duplicate
with_capacity_exact
to Vec's already-formidable API surface.closes #99385.