Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementing the exception handling proposal in Wasmtime #36

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

dhil
Copy link

@dhil dhil commented Aug 30, 2024

This RFC proposes to implement the exception handling proposal in Wasmtime. At the time of writing, exception handling is a phase 4 proposal.

I think there are lots of details worth discussing about possible designs and strategies on how to realise them. I am hoping that this document can be used to seed those discussions here.

I would like to give credit to @fitzgen for guidance on how to put this RFC together as well as helping with developing the ideas, thanks!

Rendered.

@fitzgen
Copy link
Member

fitzgen commented Aug 31, 2024

FYI, there are some discussions around how to support exception-throwing calls in the register allocator over in bytecodealliance/regalloc2#186 and some of that seems relevant for anyone interested in this RFC.

Comment on lines +118 to +121
We do not define a CLIF instruction for throwing an
exception. Instead, exception throwing must be done indirectly via an
imported function (e.g. a Wasmtime builtin libcall implemented in the
host/engine).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect we might actually want an instruction for throwing exceptions: for the overflow-flag ABI approach, we ideally don't want to tail call out to a custom asm-implemented function just to move the exception payload into a particular register, set the overflow flag, and return. We want to do that stuff inline. But I think we would be forced to do that suboptimal approach if we don't have a dedicated instruction.

Of course, when we are doing C++ ABI exceptions and DWARF, we will want to call out to _Unwind_RaiseException and friends instead.

I think we can choose between the two options in instruction selection with different lowering rules that look at the current calling convention. For the overflow-flags ABI, we'll need a custom calling convention, say tail-overflow-exceptions or something instead of our existing tail calling convention, and we can check for that or not.

The other option would be a cranelift setting that gets set by the clif producer, similar to how TLS is done. This is a little funky to me tho because we need the new calling convention either way to control whether we do things like clear flags before regular returns or not, and so setting this theoretical option to the overflow-flags version of exceptions can still only work with that special calling convention, so it feels like we'd end up with two knobs to control roughly the same thing.

Copy link
Member

@alexcrichton alexcrichton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for writing all this up! I'll cc @bjorn3 as well here since they've done work in this area with rustc_codegen_cranelift and likely have thoughts as well.

One thing which might also be worth noting in this RFC is that we probably can't do away with the longjmp/setjmp that Wasmtime uses today to implement traps. Notably that enables recovery from a signal handler and additionally enables O(1) recovery in "deep" situations like stack overflow. I was hoping we could use exceptions to implement that as well but I'm less sure of that now.

pointer-sized integer (morally the exception value), e.g. in CLIF
syntax
```clif
catch block123(v456: i64):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's some discussion on bytecodealliance/regalloc2#186 about this too, but I think there's a case to be made to not do this because at least in wasm you can branch to unwind handlers just like normal blocks so wasm is at least one consumer who will need to work around this restriction otherwise.

into a `catch` block. Instead, the control flow edges to `catch`
must come via a `try_call` instruction.

* A new call instruction `try_call <ok_label>, <exception_label>`,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One part perhaps worth pointing out here is that DWARF supports multiple unwind locations per try_call, so <exception_label> may want to actually be a list of labels. I believe that @bjorn3's initial work for rustc_codegen_cranelift modeled this with a JumpTable where the "default label" was the ok_label and everything else was an unwind location.

Before returning normally any function must clear the flag, e.g.

```
test al, al
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One possible alternative to the overflow flag is the carry flag which has jc for jumping and dedicated clc and stc instructions for clearing/setting the carry flag (they're a single byte too!)

Comment on lines +170 to +172
We reckon this approach is relatively low overhead, and it something
we can confidently implement correctly more quickly than the side
table or DWARF unwinder strategies. Adopting this strategy would allow
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing worth pointing out about this approach is that given the non-zero-cost in the non-exceptional case it will likely prevent turning this proposal on by default. The cost would be incurred for users who don't use exceptions at all unless Wasmtime implements a form of detection of exception-using-instructions which I think could get particularly hairy in the cross-instance semantics below.

Comment on lines +227 to +228
* How should we represent three-way results in the Wasmtime public
API?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this open question I think it'd be reasonable to take inspiration from the JS API for exception handling, notably we'd have exported Tag structures which the embedder could create or acquire from instances. Exceptions themselves would probably be modeled as something that can be converted to anyhow::Error and then host functions would return that error to indicate they want to throw an exception. Afterwards how this is represented internally from that point is just an implementation detail.

* No support for the legacy exception handling
revision. Justification: the legacy revision is being phased out.

* No support for unwinding across host frames. Justification:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One could add to the justification that it is, in the general case, impossible to unwind arbitary host code. Examples:

  1. .NET Jitted host code - it won't have any unwind info Wasmtime could hope to understand (.NET runtime uses an internal Windows-like unwind info even on Unix OSes).
  2. C code compiled without unwind info (and with omitted frame pointers, for good measure).
  3. Windows x86 native code does not support virtual unwinding at all.

Comment on lines +22 to +23
for exception handling is (at least) of interest to C++, Kotlin, and
OCaml toolchains. The proposal is also a prerequisite for the [stack

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for exception handling is (at least) of interest to C++, Kotlin, and
OCaml toolchains. The proposal is also a prerequisite for the [stack
for exception handling is (at least) of interest to C++, .NET, Kotlin, and
OCaml toolchains. The proposal is also a prerequisite for the [stack

The workaround we use in one of our toolchains to emulate exceptions has 10%+ code size cost (and probably even higher execution cost).

@cfallin
Copy link
Member

cfallin commented Sep 4, 2024

Thanks for writing this up! I have a few thoughts mostly on the design of the CLIF functionality to encode exceptional control flow; I see a few others have commented on the points about special (catch) blocks, and there are some interesting design questions around how try_call applies block-param arguments as well that intersect with this.

To set a baseline first, I'll suggest the basic principle of: IR design should be as orthogonal as possible, i.e., features compose and special-cases or unsupported corners that require special handling are minimized.

As a corollary of that, if existing analysis and transform passes can work without having to be modified to be aware of exceptions, all the better. (This is the end-game of "put exceptional edges into the ordinary CFG", IMHO, with try_call as a "normal-ish" branch instruction.)

CLIF Design Questions

So I see at least three design questions here:

  • Are catch blocks their own kind of block/entity, or are they ordinary CFG blocks?

  • Can catch blocks (and the non-throwing "success" edge from try_call as well) take ordinary block parameters, or are they restricted not to do so?

  • How are the normal return value(s), and exceptional value(s), that result from try_call defined?

Q1: Distinct Kind of Catch-Blocks

I'd like to propose that the first question be resolved early to: catch-targets are ordinary CFG blocks (as also noted elsewhere in this PR). My reasoning is straightforward: a new kind of block, with its own restrictions, adds cognitive overhead and correctness questions to every analysis and pass in the compiler, increasing likelihood of bugs. Furthermore, it's not clear that there are reasons that require catch-targets to be distinct. We will want to codegen them as we do other basic blocks; an unwind that moves control to the handler address is just like an ordinary jump from the predecessor block. (If I'm missing some reason why they msut be distinct, please let me know!)

Q2: Blockparams

The next question is whether we allow blockparams on blocks that are targets of try_call (normal or exceptional edge(s)). If catch-blocks were distinct, it might be tempting to limit them in this way. (Perhaps this was part of the implicit thinking in your proposal?) However, having a kind of block entity in the IR that cannot take blockparams is a severe "missing corner" in the backend, and imposes restrictions all the way up the compiler pipeline. No blockparam should be necessary on a block with a single successor, so if we have distinct catch-blocks, it is technically possible; however it means that we have to have "perfect blockparam discipline" throughout the compiler passes, never adding unnecessary blockparams. We have historically had performance issues with unnecessary blockparams where algorithms are too approximate, and we have a constant-phis pass as a result. Note that while this is annoying as a performance issue, it is critical that it is allowed. Doing so is good compiler design, IMHO: it allows factoring of concerns, where we can have simpler transforms ("always add a blockparam for X", possibly a placeholder, etc) and then normalizations/canonicalizations later.

A few concrete examples of passes that work by adding blockparams, and would be difficult to write if we had an "exception catch blocks are a special case" rule:

  • "maximal SSA" transforms and SSA-cuts for stitching together subgraphs of the CFG (e.g., in weval)
  • CFG reducibility transforms, if we ever add a Wasm backend or need this for region-based analysis
  • tail-merging/deduplication, when it finds multiple copies of the same code for different exception handlers and merges them (the blockparams are even "real" here in the sense that they merge different values)

One objection might be that exceptional control flow could interact poorly with edge-moves, i.e., the moves that the register allocator has to insert to actually put blockparams in place, since we don't codegen the branch, it just "happens" via the external unwinder. However a catch-block reached from a try_call is reached via a multi-successor branch; as a multi-successor branch, try_call forces edge-moves into the heads of its successor blocks, so there are no moves that are "skipped" when an exception is caught.

Q3: try_call Result Values

try_call defines two sets of values, one or the other alternately (never both): the "normal" return value(s) of the called function, or exceptional state in terms of one or more value(s).

The former, normal results, are available only in the "fallthrough" block and blocks it dominates; exceptional state is available only in the "catch" block and the blocks it dominates. Conceptually, one can think of the values as being defined on the edges, or maybe implicitly in the headers of the appropriate successor blocks.

This is most like the semantics of existing blockparams, so blockparams are the natural way to write these definitions, as this RFC does.

There are two "bits of weirdness" that are worth addressing, however:

  • Though the values are at the IR level defined in successor blocks, they are at the instruction level defined starting at the call instruction, and this matters very much for the register allocator. We have some discussion of this in Support branch instructions that define their blockparams regalloc2#186. It's mostly a minor implementation detail from an IR design perspective, but it's worth noting that we need to present the defs to the regalloc at the try_call itself. This way the interference with, and participation in, edge-moves is computed appropriately.

  • The blockparams are in addition to the regular blockparams, noted above. (Perhaps this was also an implicit reason for your proposal avoiding blockparams on catch-blocks?) There are a few ways to syntactically separate the regular blockparams from try_call-produced ones, if we really want, but I'm not sure if I like them -- e.g. should we write

      fn0 = (i32, i32, i32) -> i64, i64
    block1(v0: i32, v1: i32, v2: i32):
      try_call fn0(v0, v1, v2), block2(v0, v0, results), block3(v1, exceptions)
    
    ;; v3, v4 are normal blockparams; v5, v6 are returns from fn0
    block2(v3: i32, v4: i32, v5: i64, v6: i64):
      ...
    
    ;; v7 is a normal blockparam; v8 is exception state
    block3(v7: i32, v8: i32):
    

    or omit the results / exceptions keywords and implicitly append?

    (I'll note by comparison that LLVM -- which uses phis rather than blockparams -- defines the exceptional state in the catch block with a landingpad instruction, separate from a phi instruction, so it makes a distinction in kind between the two. I'm not sure this is worth emulating though.)


Sorry for the lengthy reply here -- I suppose I consider the IR design aspect one of the most important parts here, as it has long-term implications on compiler complexity and ease of implementation, and the RFC presents one design point without going into too much depth on the why, so -- here are some starting points :-)

@fitzgen
Copy link
Member

fitzgen commented Sep 9, 2024

To set a baseline first, I'll suggest the basic principle of: IR design should be as orthogonal as possible, i.e., features compose and special-cases or unsupported corners that require special handling are minimized.

As a corollary of that, if existing analysis and transform passes can work without having to be modified to be aware of exceptions, all the better. (This is the end-game of "put exceptional edges into the ordinary CFG", IMHO, with try_call as a "normal-ish" branch instruction.)

💯

I'd like to propose that the first question be resolved early to: catch-targets are ordinary CFG blocks (as also noted elsewhere in this PR). My reasoning is straightforward: a new kind of block, with its own restrictions, adds cognitive overhead and correctness questions to every analysis and pass in the compiler, increasing likelihood of bugs. Furthermore, it's not clear that there are reasons that require catch-targets to be distinct. We will want to codegen them as we do other basic blocks; an unwind that moves control to the handler address is just like an ordinary jump from the predecessor block. (If I'm missing some reason why they msut be distinct, please let me know!)

I had been assuming that we couldn't treat landing pads as normal blocks, and instead like alternative function entry points. But I think that was a mistaken assumption that I never questioned. I might have been assuming that we wouldn't allow the reuse of any values in the landing pad, as a simplification? But that seems overly extreme now. Anyways, if a block is used as both a landing pad and a regular control-flow successor, then the regalloc constraints of the try_call edge will need to force all live values onto the stack and then reload them in the landing pad (or an auto-inserted critical edge block just before the "landing pad"?) and we don't actually have to do anything differently from normal blocks? And the unwinder would need to be responsible for restoring callee-save registers as well.

Anyways, if the catch blocks are indeed an artificial constraint, then definitely we should not artificially distinguish them from regular blocks in the IR.

The blockparams are in addition to the regular blockparams, noted above. (Perhaps this was also an implicit reason for your proposal avoiding blockparams on catch-blocks?) There are a few ways to syntactically separate the regular blockparams from try_call-produced ones, if we really want, but I'm not sure if I like them -- e.g. should we write

  fn0 = (i32, i32, i32) -> i64, i64
block1(v0: i32, v1: i32, v2: i32):
  try_call fn0(v0, v1, v2), block2(v0, v0, results), block3(v1, exceptions)
      
;; v3, v4 are normal blockparams; v5, v6 are returns from fn0
block2(v3: i32, v4: i32, v5: i64, v6: i64):
  ...
      
;; v7 is a normal blockparam; v8 is exception state
block3(v7: i32, v8: i32):

or omit the results / exceptions keywords and implicitly append?

I would say that if we allow splicing the payloads into the block parameters at arbitrary positions, eg

try_call fn0(v0, v1, v2), block2(v0, results, v1), block3(exceptions, v2)

then we should/must keep the keywords. If we always append or prepend, then implicit seems fine by me.

The generality of unconstrained splicing seems nice from a user perspective, but also maybe like something we won't actually need. I think if we can't think of a we-will-definitely-need-it-for-X use case, we should just implicitly append or prepend.

I consider the IR design aspect one of the most important parts here, as it has long-term implications on compiler complexity and ease of implementation

💯

@cfallin
Copy link
Member

cfallin commented Sep 9, 2024

I had been assuming that we couldn't treat landing pads as normal blocks, and instead like alternative function entry points. But I think that was a mistaken assumption that I never questioned. I might have been assuming that we wouldn't allow the reuse of any values in the landing pad, as a simplification? But that seems overly extreme now. Anyways, if a block is used as both a landing pad and a regular control-flow successor, then the regalloc constraints of the try_call edge will need to force all live values onto the stack and then reload them in the landing pad (or an auto-inserted critical edge block just before the "landing pad"?) and we don't actually have to do anything differently from normal blocks? And the unwinder would need to be responsible for restoring callee-save registers as well.

Anyways, if the catch blocks are indeed an artificial constraint, then definitely we should not artificially distinguish them from regular blocks in the IR.

I'm relatively convinced at least right now that "normal jump" is the best way to think of (and design to ensure) the unwinder -- we'll indeed want to restore callee-saves as we unwind more nested frames for this to be the case. In other words, my mental model for an exceptional return is "just like a normal return except PC/RIP is over here", plus register(s) set with exceptional state, just as register(s) are set with return values on normal returns. (Someone please correct me if this is wrong!) An alternative world where catch blocks are separate function entries seems much more "wild" to me in the sense that it breaks assumptions in lowering and regalloc.

One slightly in-the-weeds but relevant detail here is that as we compute the lowering block order from CLIF, and generate VCode blocks, if a catch-block is also a target of a normal branch, we'll end up splitting the critical edge: the exceptional edge comes from a block with multiple successors, and goes to a block with multiple predecessors. This will require a little bit of care to get right when we generate the unwind tables (usually the main block is associated with the CLIF-level label but here we want the block with edge-moves).

(EDIT: note this is also the case if we keep the catch-blocks as a separate kind of block, unreachable from normal branches, because two or more try_calls could share the same catch-blocks; so, orthogonal to this question per-se, rather I'm braindumping a thing I think we'll want to be careful about)

I would say that if we allow splicing the payloads into the block parameters at arbitrary positions, eg

try_call fn0(v0, v1, v2), block2(v0, results, v1), block3(exceptions, v2)

then we should/must keep the keywords. If we always append or prepend, then implicit seems fine by me.

The generality of unconstrained splicing seems nice from a user perspective, but also maybe like something we won't actually need. I think if we can't think of a we-will-definitely-need-it-for-X use case, we should just implicitly append or prepend.

Yeah, I'd tend to agree with the YAGNI principle here -- and between append and prepend, if no other reasons to lean either way, I might suggest that we append, because then we preserve the property that the block-call args on the targets line up with blockparams, and the new thing (try_call results) can take the extra logic to add the length of normal block args (or subtract result count from total blockparam count, whichever).

@bjorn3
Copy link
Contributor

bjorn3 commented Sep 10, 2024

In other words, my mental model for an exceptional return is "just like a normal return except PC/RIP is over here", plus register(s) set with exceptional state, just as register(s) are set with return values on normal returns.

Exactly!

Yeah, I'd tend to agree with the YAGNI principle here -- and between append and prepend, if no other reasons to lean either way, I might suggest that we append, because then we preserve the property that the block-call args on the targets line up with blockparams, and the new thing (try_call results) can take the extra logic to add the length of normal block args (or subtract result count from total blockparam count, whichever).

I believe I used prepend in my current implementation as cranelift-frontend appends the extra blockparams necessary for handling variables in SSA form to the user defined blockparams, so prepending the args for invoke/try_call makes cranelift-frontend handle it correctly without needing any changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants