Description
Background: previous general discussion about Binaryen and wasm stack machine is at #663. It turns out Since Binary 0xc, Binaryen's type system has been subtly wrong with respect to the spec, and both V8 and SpiderMonkey were not spec compliant in that way and so nobody noticed. More concretely, @kripken tried to fix that in #899 and that led to some more general discussion about what to do. Probably that discussion should be in an issue and not in a PR, so here's an issue.
@jgravelle-google and I have been discussing it, and we have an idea which essentially carries the discussion in #899 to an end which we think we like, but maybe you can poke holes in it.
Also to recap, the goals are:
- Valid wasm is accepted by and convertable to Binaryen IR (but not necessarily losslessly)
- Valid Binaryen is convertible to valid wasm
- Transforms on Binaryen IR should be context-free rather that context-sensitive (and as a special case, a valid node should be replaceable by another node of the same type and the IR should still be type-valid).
In short, we think we can allow blocks (and everything else in Binaryen IR) to have unreachable
type, and basically freeze those types into block signatures (i.e. reify them) when converting to wasm proper. Here are the Binaryen IR properties:
- Validation/type check (of nodes other than
br
) is context-free; in other words a node is valid based on its type and it contents only (the consequence of this is that transforms are also context-free). - Any IR node can have type
unreachable
. - A node (any kind of node with any type) is valid if:
a. all operand(s) (including those that follow unreachable nodes in blocks) are recursively valid, and
b. all operands type-check according to opcode-specific rules (unreachable
always type-checks)
c. A node withunreachable
type must have at least oneunreachable
operand unreachable
nodes may be reified based on their context (see paragraph on conversion to wasm below), and the result remains valid.br
nodes are valid if their operand type matches the type of the target (br
s targetingunreachable
-typed nodes are invalid)- (special rules for other node kinds?)
On conversion to wasm, types of all block
/loop
/if
must be inferred/reified to concrete types. Inference always happens from context (i.e. outside in) and not from e.g. inside-out. In other words (and in contrast to current Binaryen), block types are not inferred based on their last element, but based on their context; e.g. if they are the operand of a return
or are implicitly fall-through-returned, then they are converted to the return type of the function. This happens for every block we emit; we check the context and if it requires a particular type, we emit that type; otherwise it can be void. When this happens, we do not need to reify anything inside (rule 3).
There are a few non-intuitive consequences of these rules.
For example
(i32.add (f32.const 0) (br 0)))
is valid (and can have type unreachable
) because of rule 3b.
(block f32
(i32.add
(i32.const 1)
(br 0 (f32.const 0))
)
)
is valid and the i32.add
must have type unreachable
. It does not get reified.
(block ;; [unreachable]
(i32.add
(i32.const 1)
(br 1 (f32.const 0)) ;; targets outer block
)
)
If the block is reified to i32
then the add can stay unreachable, or have type i32. If the block is reified to something else then it's still valid (it's the first example again). In this nesting, it never needs to take a type (and it's beneficial).
I think this also means that it's a valid transform to take a node with an unreachable
child and make it unreachable
.
I think this also has the optimization properties that we want, and that conversion from wasm is workable but we haven't thought about all the permutations yet, so... yeah, comments and we can see if it all holds up.