Skip to content

Conversation

tlively
Copy link
Member

@tlively tlively commented Jan 18, 2025

The current interpreter used in wasm-shell, the fuzzer, and
optimizations like precompute works by recursively walking the
expression tree and computing expression results as it goes. This kind
of recursive interpretation is not going to work for stack switching,
since stack switching requires stashing context away and restoring it
later. The recursive interpreter stores intermediate values on the
native stack and returns early to implement control flow, so there is no
way to suspend a computation and resume it later.

To support stack switching and support other use future interpreter use
cases such as running the full spec test suite and fuzzing multithreaded
programs, introduce a new interpreter that is not recursive and does not
store intermediate state that needs to persist beyond the execution of a
single instruction on the native stack. The new interpreter works by
iterating through instructions and visiting them one at a time in a
loop. The visitor pushes and pops values from a stack and signals
control flow via its return values. Control flow transfers are handled
by the main interpreter loop, so expressions are only visited when they
are actually executed. This design will not only support stack switching
and other features better than the old interpreter, but it will also
significantly decrease the amount of code in the interpreter.

In addition to the core interpreter loop, also lay out a skeleton of the
execution context for the new interpreter, including a call stack and
store. The code contains several TODOs describing how these runtime
structures will need to be extended to support interpreting the full
spec test suite, including the ability to interpret over multiple linked
instances at once.

Most of the actual interpretation of expressions is left as future work,
but the interpretation of Const expressions and i32.add is implemented
and tested in a new gtest file to demonstrate it working end-to-end. One
of the first milestones for the new interpreter will be getting real
spec tests running with it, at which point the gtest file can be
removed.

The current interpreter used in wasm-shell, the fuzzer, and
optimizations like precompute works by recursively walking the
expression tree and computing expression results as it goes. This kind
of recursive interpretation is not going to work for stack switching,
since stack switching requires stashing context away and restoring it
later. The recursive interpreter stores intermediate values on the
native stack and returns early to implement control flow, so there is no
way to suspend a computation and resume it later.

To support stack switching and support other use future interpreter use
cases such as running the full spec test suite and fuzzing multithreaded
programs, introduce a new interpreter that is not recursive and does not
store intermediate state that needs to persist beyond the execution of a
single instruction on the native stack. The new interpreter works by
iterating through instructions and visiting them one at a time in a
loop. The visitor pushes and pops values from a stack and signals
control flow via its return values. Control flow transfers are handled
by the main interpreter loop, so expressions are only visited when they
are actually executed. This design will not only support stack switching
and other features better than the old interpreter, but it will also
significantly decrease the amount of code in the interpreter.

In addition to the core interpreter loop, also lay out a skeleton of the
execution context for the new interpreter, including a call stack and
store. The code contains several TODOs describing how these runtime
structures will need to be extended to support interpreting the full
spec test suite, including the ability to interpret over multiple linked
instances at once.

Most of the actual interpretation of expressions is left as future work,
but the interpretation of `Const` expressions and i32.add is implemented
and tested in a new gtest file to demonstrate it working end-to-end. One
of the first milestones for the new interpreter will be getting real
spec tests running with it, at which point the gtest file can be
removed.

// TODO: generalize this so different users can override memory loads and
// stores, etc.
struct WasmStore {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is "store" in the sense of the spec, I assume?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Unfortunately just calling it Store caused too many conflicts with our Store expression class.

std::vector<Frame> callStack;

Frame& getFrame() {
assert(callStack.size());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert(callStack.size());
assert(!callStack.empty());

return callStack.back();
}

Literal pop() { return getFrame().pop(); }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the name, I expected this to pop the stack frame, but it looks like it gets the stack frame and pops inside that. So it is popping from the value stack and not the call stack?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. push and pop are going to be the most common operations that affect the interpreter context, so everything has helpers to make them as easy as possible to use.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sgtm. might be worth a comment to avoid the possible confusion.

* limitations under the License.
*/

#include "interpreter.h"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#include "interpreter.h"
#include "interpreter/interpreter.h"

I believe we generally do includes from src. There is potential ambiguity if we also use local paths in the same dir.


// TODO: This is a quick and dirty hack. We should implement a proper iterator
// in ir/iteration.h that keeps only a vector of (Expression*, index) pairs to
// find the current location in the epxression tree.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// find the current location in the epxression tree.
// find the current location in the expression tree.


// The list of remaining instructions in reverse order so we can pop from the
// back to advance the iterator.
std::vector<Expression*> exprs;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How will this work with loops etc.?

Generally, I was guessing this would use BinaryenIR itself. We could track the location of the frame pointer using a simple indexing, a vector of indexes basically,

[ index at toplevel, index at the higher level, index at a higher level still, ...]

So [1, 0] would mean we are at the second instruction at the toplevel, and the first of its children, etc. If we keep pointers each of the children then we can quickly traverse between them etc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, actually we have ChildIterator already. So a stack of ChildIterators could work?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that's what the TODO above the class is about. If for some reason we wanted to implement loops before implementing that better expression iterator, we would have to copy the vector-based iterator at each loop header and store it in a map on the frame. There are TODOs for those parts in Frame and the ExpressionIterator constructor. It's probably better to implement the better iterators first, though.

All other control flow transfers are forward edges, so they are very simple with the vector-based iterators. You just increment the iterator until you find the block matching the target.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use ChildIterator for this iteration?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, a stack of ChildIterators would work if we don't mind the memory overhead of materializing the children of each expression in the current path. Alternatively we could do something more specialized that only stores an Expression* and an Index for each expression in the path.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ChildIterator contains SmallVector<Expression**, 4> children;. In principle it could be reimplemented to contain only a single Expression* and an index.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But then each step would need to find the children again? That seems likely to be slower.

I might think otherwise if memory caching issues were possible, like it we were storing large amounts of such vectors. But typically they would be quite small?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though, I am open to being convinced otherwise of course! And if that is faster, we can implement that in ChildIterator as an optimization?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it might not matter, and if it does, improving ChildIterator would make sense.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the TODO to capture all of this.

Copy link
Member

@kripken kripken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great start!

@tlively tlively merged commit 09300e4 into main Jan 22, 2025
13 checks passed
@tlively tlively deleted the new-interpreter branch January 22, 2025 21:21
This was referenced Feb 1, 2025
ashleynh added a commit that referenced this pull request Feb 1, 2025
Building on top of #7227, i32.mul is implemented and tested.
ashleynh added a commit that referenced this pull request Feb 1, 2025
Building on top of #7227, i32.sub is implemented and tested.
@ashleynh ashleynh mentioned this pull request Feb 3, 2025
ashleynh added a commit that referenced this pull request Feb 3, 2025
Building on top of #7227, i32.mul is implemented and tested.
@ashleynh ashleynh mentioned this pull request Feb 25, 2025
ashleynh added a commit that referenced this pull request Feb 25, 2025
Building on top of #7227, the following are implemented and tested:
- f32.add
- f32.sub
- f32.mul
- f32.div
- f32.sqrt
- f32.ceil
- f32.floor
- f32.trunc
- f32.nearbyint
This was referenced Feb 25, 2025
ashleynh added a commit that referenced this pull request Feb 26, 2025
Building on top of #7227 , the following are implemented and tested:

- f64.add
- f64.sub
- f64.mul
- f64.div
- f64.sqrt
- f64.ceil
- f64.floor
- f64.trunc
- f64.nearbyint
ashleynh added a commit that referenced this pull request Feb 26, 2025
Building on top of #7227, the following are implemented and tested:
- i64.add
- i64.sub
- i64.mul
- i64.eq
- i64.ltS
- i64.ltU
- i64.gtS
- i64.gtU

While here, I added in relevant cases that leveraged the code added for
the above instructions.
- i32.eq
- f32.eq
- f64.eq

- i32.ltS
- i32.ltU
- i32.gtS
- i32.ltU
ashleynh added a commit that referenced this pull request Feb 27, 2025
Building on #7227, implemented and tested the following:
- i32/i64.and
- i32/i64.or
- i32/i64.xor
- i32/i64.shl
- i32/i64.shrU
- i32/i64.shrS
- i32/i64.rotL
- i32/i64.rotR
ashleynh added a commit that referenced this pull request Mar 5, 2025
Building on #7227, implemented and tested the following:
- i32/i64.and
- i32/i64.or
- i32/i64.xor
- i32/i64.shl
- i32/i64.shrU
- i32/i64.shrS
- i32/i64.rotL
- i32/i64.rotR
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants