Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify Box<T> representation and its use in FFI #62514

Merged
merged 8 commits into from
Dec 12, 2019
21 changes: 21 additions & 0 deletions src/liballoc/boxed.rs
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,27 @@
//! T` obtained from `Box::<T>::into_raw` may be deallocated using the
//! [`Global`] allocator with `Layout::for_value(&*value)`.
//!
//! `Box<T>` has the same ABI as `&mut T`. In particular, when `T: Sized`,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the key point here that it has the same ABI/representation as a C pointer? It feels like there is a "logical step" missing in this sentence -- e.g., Box<T> is the same as &mut T, which in turn is the same as a raw pointer or C pointer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe something like:


So long as T: Sized, a Box<T> is guaranteed to be represented as a single pointer and is also ABI-compatible with C pointers (i.e., the C type T*). This means that you have Rust code which passes ownership of a Box<T> to C code by using Box<T> as the type on the Rust side, and T* as the corresponding type on the C side. As an example, consider this C header, which declares functions that create/return some kind of Foo value:

...

These two functions might be implemented in Rust as follows. Here, the struct Foo* types from C are translated to Box<Foo>, which captures the ownership constraints. Note also that the nullable argument to destroy_foo is represented in Rust as Option<Box<Foo>>, since Box<Foo> cannot be null.

...

Even though Box<T> has the same representation and C ABI as a C pointer, this does not mean that you can convert an arbitrary T* into a Box<T> and expect things to work. Box<T> values will always be fully aligned, non-null pointers. Moreover, the destructor for Box<T> will attempt to free the value with the global allocator. In general the best practice is to only use Box<T> for pointers that originated from the global allocator.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nikomatsakis No problem for the delay and thank you for the thorough review.

The original commit that started this PR didn't mention references and focused on C pointers, but previous reviews recommended changing that.

I integrated your wording (slightly edited). Please let me know if that's ok. Thank you.

//! this allows using `Box<T>` in FFI:
RalfJung marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to make sure this does not get lost: This should talk about alignment, say that a Box is always required to be aligned even when it is not used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't each instance of a type required to be aligned according to its type? Box<T> must own a valid T. Doesn't that entail that the T instance must be properly aligned? The example code also does not explicitly indicate that the pointer passed to bar must have been returned by a previous call to foo or be null, and does not mention that passing some random memory address is also undefined. Box::from_raw does not mention alignment either.

If you think that it is important to state nonetheless, does that wording look right: "A *mut T pointer used as Box<T> is always required to be aligned, even when not dereferenced."? Thank you.

Copy link
Member

@RalfJung RalfJung Oct 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Box<T> must own a valid T. Doesn't that entail that the T instance must be properly aligned?

Validity of T is defined in terms of which sequences of bytes form a valid T; there is no such thing as an "alignment" of sequence of bytes. Alignment only comes up when you consider that sequence of bytes stored in memory.

But you are right that being aligned is not sufficient: For a *mut T to be legal to turn into a Box<T>, we need more conditions, like validity of T. We could link to these docs but they are not great either. But this section sounds useful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for explaining. Please let me know if you would like me to add a link to this other section of the docs or anything else to this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR actually appends documentation to this "Memory layout" section, so I think a (self referential) link is not needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm inclined to agree with @stephaneyfx that a self-referential link doesn't make sense. I think the key point is that you can "round-trip" a box through to a C function (where it is a C pointer) and back again. In this case, the pointer will naturally obey all the criteria. It's not really safe to take some arbitrary C pointer and cast it to a box unless it came from the global allocator (which in turn guarantees its alignment). Maybe we can reword the "Box is interopable" with FFI to emphasize this use case? (Which is precisely the one that the example shows)

//!
//! ```c
//! /* C header */
nikomatsakis marked this conversation as resolved.
Show resolved Hide resolved
//! struct Foo* foo(); /* Returns ownership */

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Returns" is ambiguous - can be construed as returning ownership to the function. I would say "Returns ownership to the caller." or "Gives ownership to the caller.".

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proper C syntax is struct Foo* foo(void);.

//! void bar(struct Foo*); /* `bar` takes ownership */

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment has the format "bar takes.." but the comment above is not "foo returns ...". So to be consistent, would change the comment to "Takes ownership from the caller.".

//! ```
//!
//! ```
//! #[repr(C)]
//! pub struct Foo;
//!
//! #[no_mangle]
//! pub extern "C" fn foo() -> Box<Foo> {
//! Box::new(Foo)
//! }
//!
//! #[no_mangle]
//! pub extern "C" fn bar(_: Option<Box<Foo>>) {}
//! ```
//!
//! [dereferencing]: ../../std/ops/trait.Deref.html
//! [`Box`]: struct.Box.html
Expand Down