Inline context.{get,set} in components#13194
Inline context.{get,set} in components#13194alexcrichton merged 3 commits intobytecodealliance:mainfrom
context.{get,set} in components#13194Conversation
This commit reimplements the `context.{get,set}` intrinsics in the
component model, introduced in the component-model-async and
component-model-threading proposals. The intent of these intrinsics in
WASIp3, for example, are intended to replace the `global`s used for the
stack pointer and TLS base in previous modules, for example. The
implementation of loading from a `global` is a single load instruction,
whereas the previous implementation of `context.get` was a full libcall,
which is significantly more expensive. The goal of this PR is to ensure
that the transition to using `context.get` and `context.set` for
high-performance uses retains the same performance as the WASIp2
constructs.
Specifically the storage for `context.{get,set}` slots have been moved
into the `VMStoreContext` structure which has a known layout to compiled
code. There still remains storage within each `GuestThread` because
there's only one store, and the idea is that whenever threads are
switched between the switch operation is slightly more expensive now
where it has to update and maintain the state in the store. The
rationale for this is that it'll be far more often that these values are
accessed rather than threads being swapped between.
The implementation chosen in this commit is to model the
`context.{get,set}` intrinsics as `UnsafeIntrinsic`s. This is a bit of a
shoehorn where they're not actually unsafe, but all of the plumbing and
support for `UnsafeIntrinsic` is effectively exactly what these want. To
avoid duplicating lots of infrastructure that's where these now reside.
The `concurrent.rs` implementation has been updated to save/restore
context from the store, and this additionally updates a few other switch
points to ensure that the store never switches away or to a deleted
thread. This niche situation happened in a few scenarios with no impact
from before, but with the switching implementation having to access
threads it became load-bearing that these must be valid.
The end result is that with `-Cinlining` the `context.{get,set}`
instructions are two instructions instead of a full libcall. One
instruction is loading `VMStoreContext`, which is GVN-able and
hoist-able, while the other is the actual load/store. This is the same
as the performance of the stack pointer being in an imported global,
for example.
| | UnsafeIntrinsic::ContextSetI32_1 => ir::types::I32, | ||
| _ => unreachable!(), | ||
| }; | ||
| let context_slot_size = 4; |
There was a problem hiding this comment.
Should this be ty.bytes() or something to future proof for memory64?
There was a problem hiding this comment.
Eventually, yes, but not yet unfortunately. The width of the type being loaded (the above match) and the width of the slot are independent and will need to be configured separately. The width above will be generalized to i64 eventually once the type is plumbed through, and the width of the slot will need to be increased when memory64 support is added. I'll add some comments and defensive assertions too.
| UnsafeIntrinsic::ContextGetI32_0 | UnsafeIntrinsic::ContextGetI32_1 => { | ||
| let context = self.builder.ins().load( | ||
| ty, | ||
| MemFlags::trusted(), |
There was a problem hiding this comment.
Unless you set the can move flag, we won't actually hoist these out of loops, which you imply we will do in the commit message.
There was a problem hiding this comment.
Oh the hoistable load was the *mut VMStoreContext pointer load, not this one.
This commit reimplements the
context.{get,set}intrinsics in the component model, introduced in the component-model-async and component-model-threading proposals. The intent of these intrinsics in WASIp3, for example, are intended to replace theglobals used for the stack pointer and TLS base in previous modules, for example. The implementation of loading from aglobalis a single load instruction, whereas the previous implementation ofcontext.getwas a full libcall, which is significantly more expensive. The goal of this PR is to ensure that the transition to usingcontext.getandcontext.setfor high-performance uses retains the same performance as the WASIp2 constructs.Specifically the storage for
context.{get,set}slots have been moved into theVMStoreContextstructure which has a known layout to compiled code. There still remains storage within eachGuestThreadbecause there's only one store, and the idea is that whenever threads are switched between the switch operation is slightly more expensive now where it has to update and maintain the state in the store. The rationale for this is that it'll be far more often that these values are accessed rather than threads being swapped between.The implementation chosen in this commit is to model the
context.{get,set}intrinsics asUnsafeIntrinsics. This is a bit of a shoehorn where they're not actually unsafe, but all of the plumbing and support forUnsafeIntrinsicis effectively exactly what these want. To avoid duplicating lots of infrastructure that's where these now reside.The
concurrent.rsimplementation has been updated to save/restore context from the store, and this additionally updates a few other switch points to ensure that the store never switches away or to a deleted thread. This niche situation happened in a few scenarios with no impact from before, but with the switching implementation having to access threads it became load-bearing that these must be valid.The end result is that with
-Cinliningthecontext.{get,set}instructions are two instructions instead of a full libcall. One instruction is loadingVMStoreContext, which is GVN-able and hoist-able, while the other is the actual load/store. This is the same as the performance of the stack pointer being in an imported global, for example.