Description
Not sure this is the right place to put this, but @ubsan asked for it:
Semantic divergences from current rustc are in bold. Unresolved questions are (XXX: question).
Arenas
In the code examples here, assume this struct has been declared:
struct NoisyDrop(&'static str);
impl Drop for NoisyDrop {
fn drop(&mut self) {
println!("dropping {}", self.0);
}
}
Arenas are used to ensure orderly and deterministic destruction of local variables and temporaries (XXX: buffer temporaries created for order-of-evaluation).
During its execution, a Rust function manages a stack of arenas, pushing new arenas to it and popping them.
Newly-created locals and temporary locations are always allocated at the end of an arena, but the arena need not be the topmost arena on the stack, as in example (1).
When an arena is popped, the locations within the arena are destroyed in the reverse order of their allocation. However, parts of a location that are in a deinitialized state are not
destroyed, as in example (2).
Lifetime and region checking treats the destruction of an arena as through each location was destroyed separately, in order, but subject to implementation limitations.
The location allocated for a local/temporary (the "alloca") is valid from the moment it is allocated until just after the value inside it is destroyed when it is popped, as through it was an &move
reference.
To simplify implementation, each arena in Rust can contain only a sequence of locations
whose type and size are known at compile-time. Of course, this does not imply that an arena
is stored in that order in memory.
NOTE: arenas are sometimes called "destruction scopes", but in rustc destruction scopes do not include binding arenas, so that term would be confusing. (XXX: Bikeshed!)
The arena tree
The arenas in a function are structured as a tree defined by the program's structure:
- When a function begins, the function's root arena is created and pushed. That arena
is popped when the function exits. - The following expression places push an expression arena when control enters them and pop it when it exits them:
- The initializer of a
let
-statement, if any - The expression of an expression statement, edit 2017-12-25 with or without a semicolon
- The body of a
match
arm - The guard of a
match
arm, if any - The RHS of a logical
&&
or||
expression - The condition of an
if
orwhile
expression
- The initializer of a
- The following block places push a block arena when control enters them and pop it when it leaves them:
- The body of a function
- The body of a
loop
orwhile
loop - The arms of an
if
-expression.
- Each block, match guard, or match arm pushes a binding arena when it is entered (XXX: do we want after? Does it matter?). The arena is popped along with the statement's containing block's.
Remember that if let
, while let
and for
are defined by their desugaring in terms of loop
and match
.
Observe that the tail expression of a block is under the block's binding arena but none of the other arenas
Before a parent arena is popped, all of its children are popped in LIFO order.
Local variables
Local variables, or bindings (XXX: AFAICT the terms are used interchangeably in rustc - do we want to change this?) created by a let
-statement are allocated into the statement's containing block's binding arena. The local variables are allocated before the statement's initializer executes.
Local variables created by argument buffers (see example (3)) and argument bindings are allocated in the function's root arena. The order in which they are allocated is that each argument's patterns are allocated followed by the argument's buffer, from left to right.
Local variables created by a match
arm are allocated twice - once when the guard is executed, within the guard's binding arena (which will not call a destructor because bindings in guards must either be Copy
or ref mut t
, but will release the binding's storage) (XXX: pending final status of borrowck in guards) and once within the arm's binding arena (XXX: this differs a bit from the current behavior, but the current behavior is unsound and not something I would like to keep - cc @eddyb @nikomatsakis).
Temporaries
Temporaries that are created by a temporary lexpr borrow-extended by a let-statement are allocated within that let-statement's containing block's binding arena. Other temporaries are allocated into the topmost non-binding arena on the stack when they are created (see Example (1)).
If the pattern of a let
-statement contains a by-ref binding, the root lexpr of the let
-statement's expression is borrow-extended (see Example (4)).
If an address-of expression is distinguished subexpression of a let
-statement, the root lexpr of the address-of expression's subexpression is borrow-extended (see Example (5)).
The following expressions are distinguished subexpression of a let
-statement:
- The expression of that
let
-statement - The subexpression of a address-of, cast, or parentheses that is a distinguished subexpression of that
let
-statement. - The fields (XXX: but not the FRU base) of a dictionary expression, tuple expression, or array expression (XXX: but not repeat expression) that is a distinguished subexpression of that
let
-statement. - The tail of a block expression that is a distinguished subexpression of that
let
-statement.
(XXX: this is just documenting the current implementation. Should we do better - see the merged but unimplemented RFC: Better temporary lifetimes (so e.g. .as_slice() works) rfcs#66?).
In some other cases that have yet to be documented (XXX: document them).
No other temporaries are borrow-extended (e.g. type annotations do not matter - see rust-lang/rust#36082).
Example (1) - basic arenas
fn basic() {
let x = NoisyDrop("local `x`");
(NoisyDrop("temporary"), ()).1
}
Here there are 4 arenas:
root arena
- function block arena
- function block binding arena
- `let` expression arena
Here, the local x
is allocated in the binding arena, but the (NoisyDrop("temporary"), ())
temporary skips the binding arena and is instead allocated into the function's block arena,
and is destroyed after the NoisyDrop("local
x")
.
Example (2) - conditional drop
fn condi_drop(init_flag: bool, fini_flag: bool) {
let (init, fini);
fini = NoisyDrop("fini");
if init_flag {
init = NoisyDrop("init");
}
if !fini_flag {
std::mem::forget(fini);
}
}
Here, the locations of the locals init
and fini
are allocated from the same arena
in that order. Therefore, when the function is exited, fini
is destroyed followed
by init
.
At least, that is the case if both are initialized - when both flags are true
. If
init_flag
is false
, then init
is never initialized and therefore not destroyed, and
if fini_flag
is false
, then fini
is deinitialized without its destructor being run
by mem::forget
, and therefore it is not destroyed.
Example (3) - argument buffers
fn argument_buf((x1,_,z1): (NoisyDrop, NoisyDrop, NoisyDrop),
(x2,_,z2): (NoisyDrop, NoisyDrop, NoisyDrop)) {
}
fn main() {
argument_buf((NoisyDrop("x1"), NoisyDrop("y1"), NoisyDrop("z1")),
(NoisyDrop("x2"), NoisyDrop("y2"), NoisyDrop("z2")),);
}
Here, the first argument's bindings x1
and z1
are allocated first, followed by the first argument's buffer (which contains the triple (NoisyDrop("x1"), NoisyDrop("y1"), NoisyDrop("z1"))
). Then, the same happens for the second argument.
Afterwards, all but the middle 2 fields of the buffers are moved to the argument bindings, so only the middle NoisyDrop
's destructor is called when the buffer is destroyed.
Everything is allocated onto the function's root arena and dropped in reverse order, leading to
dropping y2
dropping z2
dropping x2
dropping y1
dropping z1
dropping x1
being printed
Example (4) - borrow extension by ref pattern
fn main() {
let ref foo = (NoisyDrop("extended 1"), NoisyDrop("extended 2")).0;
let _ = NoisyDrop("external");
}
The root lexpr of (NoisyDrop("extended 1"), NoisyDrop("extended 2")).0
is the lexpr (NoisyDrop("extended 1"), NoisyDrop("extended 2"))
, which is extended to the binding arena and therefore dropped after the unextended let
.
This prints
dropping external
dropping extended 1
dropping extended 2
Example (5) - borrow extension by designated expression
fn main() {
let _x : (&_, NoisyDrop) = (
{&[&[NoisyDrop("extended 1")]]},
NoisyDrop((&NoisyDrop("unextended"), "extended 2").1)
);
drop(NoisyDrop("external"));
}
This is a rather complicated case. The arena tree is
root arena
- function block arena
- function block binding arena
- let expression arena
- second statement expression arena
First, storage for _x
is allocated in the block binding arena. Next, the first 2 address-of expressions are distinguished subexpressions of the let
-statement, so the 2 arrays are allocated in that binding arena too. The third address-of expression is inside a function call, so it is not a distinguished subexpression, and is allocated within the let expression arena.
After the let-statement completes, the let expression arena is dropped, printing "dropping unextended".
Then, the second statement prints "dropping external".
Afterwards, the binding arena is dropped in reverse order of pushing - first the temporary is dropped, printing "dropping extended 1", and the _x
is dropped, printing "dropping extended 2".
The overall output is
dropping unextended
dropping external
dropping extended 1
dropping extended 2