Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ABI-stabilize core::task::Waker #3404

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

p-avital
Copy link

@p-avital p-avital commented Mar 28, 2023

Implementing FFI-safe futures for stabby has given me the opportunity to find out just how hard Wakers are to deal with when attempting to pass futures across the FFI boundary.

This RFC aims to make dealing with them not only much easier, but also much more performant than currently possible, removing a tree-sized splinter from the thumb of projects that need to pass futures across the FFI boundary.

Disclaimer: this RFC proposes breaking API for completeness' sake only: my firm opinion is that Rust's backward compatibility guarantees should be protected, and the proposed non-breaking solution's runtime cost, while existent, is likely to become negligible as soon as the common executors switch to the newly proposed constructor for waker vtables.

Rendered

@workingjubilee
Copy link
Member

A library break in types is, generally, strictly unacceptable, and has already been foreclosed by previous RFCs, unless the matter concerns a soundness violation or it is the only path forward for a language evolution. Editions are not permitted to change the standard library API in the way you propose, because of the Epoch RFC that defined them:

To be crystal clear: Rust compilers must support all extant editions, and a crate dependency graph may involve several different editions simultaneously. Thus, editions do not split the ecosystem nor do they break existing code.

@p-avital
Copy link
Author

p-avital commented Mar 28, 2023

A library break in types is, generally, strictly unacceptable, and has already been foreclosed by previous RFCs, unless the matter concerns a soundness violation or it is the only path forward for a language evolution. Editions are not permitted to change the standard library API in the way you propose, because of the Epoch RFC that defined them:

To be crystal clear: Rust compilers must support all extant editions, and a crate dependency graph may involve several different editions simultaneously. Thus, editions do not split the ecosystem nor do they break existing code.

Then it's a good thing that's not the one I consider best (or even good). The proposed implementation PR opts for the stable union implementation

@workingjubilee
Copy link
Member

Indeed. I have thoughts about that as well but I figured I would give you a heads up that an RFC proposing anything that even resembles a breakage (or even a stabilization that cannot later be undone!) should be responsive and consider the concerns enumerated in RFC 2502, RFC 1105, and RFC 1122.

@p-avital
Copy link
Author

I'll edit the RFC to remove the API break option, since it really isn't one :)

@ehuss ehuss added the T-libs-api Relevant to the library API team, which will review and decide on the RFC. label Mar 29, 2023
Copy link

@oxalica oxalica left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like the idea to freeze Waker fns to extern "C" fn.

  1. Enforcing extern "C" in vtable benefits only FFI scenario. More usual non-FFI code doesn't really benefits from it.
  2. Instead, non-FFI code needs to write extra extern "C", and are deoptimized due to implementation limitation: extra indirection call, missed extern-Rust specific future optimization and etc.
  3. It also introduces future compatibility concerns, since we would never change the ABI anymore even after edition bumps (ABI-stabilize core::task::Waker #3404 (comment)).


Futures have now become a core feature of Rust, and returning boxed futures from trait-methods has become a very common pattern when working with `async`.

However, due to Rust's unstable default ABI, trait-objects are never FFI-safe (note that this doesn't stop users trying). Multiple crates (such as [`abi_stable`](https://crates.io/crates/abi_stable) and [`stabby`](https://crates.io/crates/stabby)) attempt to solve this problem by providing proc-macros that generate stable v-tables for the traits they annotate.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's worth mentioning async-ffi which covers exactly the FFI-compatible waker scenario.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I wasn't aware of this one, I shall look it up this instant :)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After checking it out, it uses the first technique I envisioned for stabby: allocating a new Waker on poll. stabby now only allocates on the first call to waker.clone()

Comment on lines +53 to +58
unsafe extern "C" fn clone_adapter(clone: unsafe fn(*const ()) -> RawWaker, data: *const ()) -> RawWaker {
clone(data)
}
unsafe extern "C" fn other_adapter(other: unsafe fn(*const ()), data: *const ()) {
other(data)
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This introduces indirection penalty to all existing ordinary async code. It sounds unacceptable to me.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, I see no other way of guaranteeing that calling unsafe fn is safe to call as soon as futures can come from the FFI (even if they themselves are ABI-stable).

I think the penalty is negligible considering that the adapters will likely stay in cache, and wakers typically have work that involves either locking or lock-less data structures, both being likely to strongly outweigh the cost of that indirection. I arguably don't have any measure to back this assumption up.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the penalty is negligible considering that the adapters will likely stay in cache, and wakers typically have work that involves either locking or lock-less data structures, both being likely to strongly outweigh the cost of that indirection. I arguably don't have any measure to back this assumption up.

It's still not zero-cost for majority non-FFI users.

Also if you think extra indirection is acceptable, then so does the indirection penalties of current vtable wrapping technique. But that is one of reasons why you want this RFC.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will study async-ffi further to see if their technique bypasses that, but my issue with the current waker isn't the indirection so much as the need to allocate upon cloning the waker... I'll come back to you once I've explored this again.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case of async-ffi, the allocation happens as soon as poll is called, regardless of whether poll would've clone the waker or not.

text/0000-stable-wakers.md Outdated Show resolved Hide resolved
@p-avital
Copy link
Author

p-avital commented Mar 29, 2023

I don't like the idea to freeze Waker fns to extern "C" fn.

  1. Enforcing extern "C" in vtable benefits only FFI scenario. More usual non-FFI code doesn't really benefits from it.
  2. Instead, non-FFI code needs to write extra extern "C", and are deoptimized due to implementation limitation: extra indirection call, missed extern-Rust specific future optimization and etc.

I'm not particularly attached to extern "C", as long as the calling convention is stable, any would do, but I'm no expert in these. Would fastcall be good? it should keep the *const () in register which, I think, is as good as we can have it. Nothing I can do about indirection with this proposal though...

  1. It also introduces future compatibility concerns, since we would never change the ABI anymore even after edition bumps (#3404 (comment)).

Don't all constructors on structures that may at some point require new fields that can't be defaulted have the same future compatibility concern? Arguably this one is more subtle due to the fact that fields shouldn't be reordered once done, but that's a minor concern with fields this uniform.

Alternative proposal (throwing spaghetti at the wall here):

Wakers could be made generic over the vtable layout

trait WakerVTable {
  type CloneFn: Fn(*const ()) -> RawWaker<Self>;
  type OtherFn: Fn(*const ());
  unsafe fn clone_fn(&self) -> Self::CloneFn;
  unsafe fn wake_fn(&self) -> Self::OtherFn;
  unsafe fn wake_by_ref_fn(&self) -> Self::OtherFn;
  unsafe fn drop_fn(&self) -> Self::OtherFn;
}
// RawWaker => RawWaker<VTable: WakerVTable = RawWakerVTable>
// Waker => Waker<VTable: WakerVTable = RawWakerVTable>
// Context => Context<VTable: WakerVTable = RawWakerVTable>
// Future => Future<VTable: WakerVTable = RawWakerVTable>
impl WakerVTable for RawWakerVTable {
  type CloneOutput = RawWaker;
  type CloneFn = fn(*const ()) -> Self::CloneOutput; // not sure how to have the unsafe here...
  type OtherFn = fn(*const ());
  unsafe fn clone_fn(&self) -> Self::CloneFn {unsafe {crate::mem::transmute(self.clone)}
  // ...
}

This way, it's possible to design new vtable layouts, the state machine generation can stay entirely identical, code that uses Waker and Future stays identical, but is only compatible with the current, FFI-unsafe wakers.

But that way, new code can be written to support other v-table types, such as StableWakerVTable, which could use any chosen calling conventions and repr(C).

Support for FFI-safe wakers would be incremental: executors would need to have new methods to spawn futures, which would have to either be stored separately or in an enum; hand-written futures would need to add the new generic to their impl of Future to be compatible with the FFI-safe wakers. But overall, I don't think this breaks anything, and doesn't cost anything to projects that are OK with not being FFI-safe.

@p-avital
Copy link
Author

@oxalica could you weigh in on that "alternative proposal" just above? This would even let alternative vtable layout such as the stable one be provided by external crates, removing the need for core to actually have a stable vtable.

Combined with waker_getters, this would give all the flexibility required to do plenty of nice things.

I might need mentoring to implement a prototype though, because the future_trait lang item requires 0 generics, preventing me from trying to make Future<VTable> on my branch, and I have no idea how to deal with that (this being my first attempt at contributing to Rust, I'm completely new to the process, sorry)

(also, sorry for pinging you, I'm not sure if you'd rather I not in the future, do let me know)

@clarfonthey
Copy link

This is an extremely out-there suggestion, but perhaps there could longer-term be some mechanism for Rust to interpret certain Rust-ABI functions as actually being another ABI, and modifying compilation accordingly?

Basically, since Rust's ABI is explicitly unstable (and probably will be forever, instead relying on explicit ABI annotation for stable ABIs), we essentially have free reign to choose the ABI depending on the circumstances. So, we could have the ability to say "actually, the functions inside Waker should use the C ABI internally" so that functions without the C ABI are the ones which incur the branch penalty, and C-ABI functions are free.

Of course, this is probably not a very useful optimisation, but I figured I'd ponder here if that would be a potential solution.

@p-avital
Copy link
Author

@clarfonthey Doing this automatically sounds weird to me, but I'm currently writing another, much more ambitious (to ambitious for me to stand any chance of implementing it without a lot of mentoring) RFC that's similar: what if when running rustc, you could ask it to use a given layout/ABI combination for all things that don't have an explicitly specified one? This would let people instantly make all of their library ffi-safe, at the cost of losing out on the latest optimizations. As better layouts and ABI are found, and provided the compiler team is willing to support them, this would let people use the most recently stabilized efficient layouts globally, instead of needing to mark all types down to core like they currently have to

@workingjubilee
Copy link
Member

workingjubilee commented Mar 30, 2023

We currently have -Zrandomize-layouts, so there's some precedent for "flags that do goofy things to Rust layouts". I'm not sure a "rewrite as this other stable one" would be approved as a stable feature, but...

I'm not particularly attached to extern "C", as long as the calling convention is stable, any would do, but I'm no expert in these. Would fastcall be good? it should keep the *const () in register which, I think, is as good as we can have it. Nothing I can do about indirection with this proposal though...

extern "fastcall" is the x86 Microsoft calling convention, it has nothing to do with anything else, honestly (and thus isn't necessarily all-that well-defined on other platforms...), and it's prooobably a bit of a bug that you can use it when Windows isn't involved.

@p-avital
Copy link
Author

@workingjubilee Thanks for kinda confirming my suspicion, since most of the calling conventions listed in the nomicon are Microsoft calling conventions, I was wondering which ones could really be treated as stable, and how the Microsoft ones were translated on other systems.

I see you've reacted to the Future<VTable: WakerVTable =core::task::RawWakerVTable> proposal, I actually think Future<Waker: Wake + Clone = core::task::Waker> would much simpler (and would allow some ZST shenanigans too).

There's a papercut: assuming you have a Future that's completely generic over the Waker (be it handwritten or generated by an async block), functions that return them opaquely would need an extra generic argument to support the new generic Waker:

fn returns_impl_future<Waker>() -> impl Future<W> {...}
fn returns_boxed_future<Waker>() -> Box<dyn Future<W>> {...}

Presumably, async fn could keep on returning an anonymous type that has all the impls, so this shouldn't be too much of an issue, but would this be an issue for async-traits?

Still, that seems to me like an actually much better proposal than this current one, as it actually expands the scope of how executors would be allowed to work. Should this be a new RFC?

@p-avital
Copy link
Author

This is kinda off-topic for this RFC, but just throwing another idea out there: right now, the documentation states that ABI is unstable including between two runs of the same compiler on the same machine:

  1. How far from the current truth are people who assume that despite the guarantee not being stated, two runs of the same compiler on the same machine will produce ABI-compatible binaries? Do we know what factors influence:
    a. Layout generation: under what circumstances would the same struct be compiled with differing layouts?
    b. Calling conventions: what influences the calling convention of fn? Argument number/size?
  2. How feasible would it be for the compiler to summarize those factors?

The point I'm trying to reach is: would the compiler be able to provide a form of hash which, barring hash collision, would identify the way it would represent types as well as what ABI it would use for fn?

If such a hash were to exist, we could at least make the "blessed compiler" approach truly safe: for example, Rust could (provided it was asked to) add an undefined symbol _rustc_abi_hash_<hash> to all cdylibs for which the improper_cdecl_type lint triggers, and always define this symbol in executables, such that linkage would fail if there's a mismatch.

How to have Rust consistently use the same ABI is possibly a whole other can of worms, but this could allow safe usage of FFI-unsafe Rust across the FFI-boundary, even if producing artifacts that match may be complex/unfeasible depending on the version/flags used.

@programmerjake
Copy link
Member

If such a hash were to exist, we could at least make the "blessed compiler" approach truly safe: for example, Rust could (provided it was asked to) add an undefined symbol _rustc_abi_hash_<hash> to all cdylibs for which the improper_cdecl_type lint triggers, and always define this symbol in executables, such that linkage would fail if there's a mismatch.

that would imho improperly limit existing cdylibs that do stuff like so:

#[derive(Debug, Default)]
pub struct MyState { // type acts like extern type to all users of this cdylib
    state: String, // something that triggers improper_cdecl_type
}

#[no_mangle]
pub extern "C" fn alloc_my_state() -> Box<MyState> {
    Box::new(MyState::default())
}

#[no_mangle]
pub extern "C" fn free_my_state(_v: Option<Box<MyState>>) {}

#[no_mangle]
pub extern "C" fn modify_my_state(this: &mut MyState, a: u32) {
    this.state += char::from_u32(a).unwrap();
}

#[no_mangle]
pub unsafe extern "C" fn debug_my_state(this: &MyState, f: unsafe extern "C" fn(*const u8, usize, *mut ()), f_state: *mut ()) {
    let s = format!("{this:?}");
    unsafe { f(s.as_ptr(), s.len(), f_state) }
}

@p-avital
Copy link
Author

If such a hash were to exist, we could at least make the "blessed compiler" approach truly safe: for example, Rust could (provided it was asked to) add an undefined symbol _rustc_abi_hash_<hash> to all cdylibs for which the improper_cdecl_type lint triggers, and always define this symbol in executables, such that linkage would fail if there's a mismatch.

that would imho improperly limit existing cdylibs that do stuff like so:

#[derive(Debug, Default)]
pub struct MyState { // type acts like extern type to all users of this cdylib
    state: String, // something that triggers improper_cdecl_type
}

#[no_mangle]
pub extern "C" fn alloc_my_state() -> Box<MyState> {
    Box::new(MyState::default())
}

#[no_mangle]
pub extern "C" fn free_my_state(_v: Option<Box<MyState>>) {}

#[no_mangle]
pub extern "C" fn modify_my_state(this: &mut MyState, a: u32) {
    this.state += char::from_u32(a).unwrap();
}

#[no_mangle]
pub unsafe extern "C" fn debug_my_state(this: &MyState, f: unsafe extern "C" fn(*const u8, usize, *mut ()), f_state: *mut ()) {
    let s = format!("{this:?}");
    unsafe { f(s.as_ptr(), s.len(), f_state) }
}

Indeed, Box<MyState> would trigger the lint and thus the symbol insertion... Then just providing some way of inserting the symbol (like a macro) and/or a const in core would do the trick, but require explicit declaration on both sides, which is fine I think

@workingjubilee
Copy link
Member

How far from the current truth are people who assume that despite the guarantee not being stated, two runs of the same compiler on the same machine will produce ABI-compatible binaries? Do we know what factors influence:
a. Layout generation: under what circumstances would the same struct be compiled with differing layouts?
b. Calling conventions: what influences the calling convention of fn? Argument number/size?

Optimization levels can impact effective ABI.
Sorry you had to find out this way. 😅

@workingjubilee
Copy link
Member

since most of the calling conventions listed in the nomicon are Microsoft calling conventions, I was wondering which ones could really be treated as stable, and how the Microsoft ones were translated on other systems.

As far as I know, for a given architecture, specifying one of the Windows ABIs either uses the specified Windows calling convention, if there is a version of that Windows CC for the architecture, or it just... doesn't make any sense at all and may produce... unexpected results.

@p-avital
Copy link
Author

p-avital commented Mar 31, 2023

Optimization levels can impact effective ABI. Sorry you had to find out this way. 😅

That's actually exactly what I expected, thanks for confirming that, imma gonna add that to our canary for our projects (that currently use the "blessed compiler" approach: the fact that our user only complain when they use a different version of the compiler leads me to think we got lucky a lot with how they set opt-level).

Any other things that spring to mind (excluding the layout randomization flag)? If not, do we know that's all of it? Because if we know compiler-version + opt-level are the only things beside explicit randomization that could cause ABI changes with the same signatures, the "perfect linkage canaries" (those that would give 100% guarantee of ABI-agreement) are actually doable today as a crate 😄

Request for mentorship

About the generic future thing: I'd love to give it a try, but the future lang-feature requires that there be exactly 0 generics on the trait. I tried changing that requirement in compiler/rustc_hir/src/lang_items.rs, but since the library is built first, that doesn't work without staging the build. I suppose the staging would look like:

  1. Change requirement for Future to GenericRequirement::Minimum(0),
  2. Run x.py build
  3. Make x.py aware that I want to use the result of that build as the compiler for the next build (that's the part I need help on, I have no idea how to do that 😢)
  4. Add the generics to Future in the library
  5. Profit locally, look for things that the generic might break despite the default being there.
  6. Figure out the workflow to make that happen on the main repo should an RFC on Generic Futures be approved

@programmerjake
Copy link
Member

  1. Make x.py aware that I want to use the result of that build as the compiler for the next build (that's the part I need help on, I have no idea how to do that cry)

generally what's done is by using #[cfg(bootstrap)] in the library, that is disabled when compiling the library with your modified compiler and enabled when compiling with the bootstrap compiler (probably beta)

e.g. here a new intrinsic is only used when #[cfg(not(bootstrap))] since then the compiler has the new code to understand it:
https://github.com/rust-lang/rust/blob/eb3e9c1f45981b47160543cfd882ca00e69bbfab/library/core/src/option.rs#L777-L793

@workingjubilee
Copy link
Member

workingjubilee commented Mar 31, 2023

Any other things that spring to mind (excluding the layout randomization flag)?

...uh. Whether the function is publicly visible or not. ( I think not in the Rust sense but in the "exported from the binary" sense? ) There's an optimization pass that LLVM runs that can change the calling convention of functions that are known to not be publicly exposed.

@p-avital
Copy link
Author

...uh. Whether the function is publicly visible or not. ( I think not in the Rust sense but in the "exported from the binary" sense? ) There's an optimization pass that LLVM runs that can change the calling convention of functions that are known to not be publicly exposed.

Makes sense, I wouldn't even rely on a non-publicly visible function existing in the binary (since it could also have been inlined on every call-site).

Sorry for pestering you so much. Hopefully I'll pay you back for your time by making the Rust-to-Rust-dynlink experience better by trying to regroup all of this :)

@p-avital p-avital mentioned this pull request May 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-libs-api Relevant to the library API team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants