-
Notifications
You must be signed in to change notification settings - Fork 799
[interpreter] Begin implementation of a new interpreter #7227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The current interpreter used in wasm-shell, the fuzzer, and optimizations like precompute works by recursively walking the expression tree and computing expression results as it goes. This kind of recursive interpretation is not going to work for stack switching, since stack switching requires stashing context away and restoring it later. The recursive interpreter stores intermediate values on the native stack and returns early to implement control flow, so there is no way to suspend a computation and resume it later. To support stack switching and support other use future interpreter use cases such as running the full spec test suite and fuzzing multithreaded programs, introduce a new interpreter that is not recursive and does not store intermediate state that needs to persist beyond the execution of a single instruction on the native stack. The new interpreter works by iterating through instructions and visiting them one at a time in a loop. The visitor pushes and pops values from a stack and signals control flow via its return values. Control flow transfers are handled by the main interpreter loop, so expressions are only visited when they are actually executed. This design will not only support stack switching and other features better than the old interpreter, but it will also significantly decrease the amount of code in the interpreter. In addition to the core interpreter loop, also lay out a skeleton of the execution context for the new interpreter, including a call stack and store. The code contains several TODOs describing how these runtime structures will need to be extended to support interpreting the full spec test suite, including the ability to interpret over multiple linked instances at once. Most of the actual interpretation of expressions is left as future work, but the interpretation of `Const` expressions and i32.add is implemented and tested in a new gtest file to demonstrate it working end-to-end. One of the first milestones for the new interpreter will be getting real spec tests running with it, at which point the gtest file can be removed.
|
||
// TODO: generalize this so different users can override memory loads and | ||
// stores, etc. | ||
struct WasmStore { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is "store" in the sense of the spec, I assume?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Unfortunately just calling it Store
caused too many conflicts with our Store
expression class.
src/interpreter/store.h
Outdated
std::vector<Frame> callStack; | ||
|
||
Frame& getFrame() { | ||
assert(callStack.size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert(callStack.size()); | |
assert(!callStack.empty()); |
return callStack.back(); | ||
} | ||
|
||
Literal pop() { return getFrame().pop(); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the name, I expected this to pop the stack frame, but it looks like it gets the stack frame and pops inside that. So it is popping from the value stack and not the call stack?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. push
and pop
are going to be the most common operations that affect the interpreter context, so everything has helpers to make them as easy as possible to use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sgtm. might be worth a comment to avoid the possible confusion.
src/interpreter/interpreter.cpp
Outdated
* limitations under the License. | ||
*/ | ||
|
||
#include "interpreter.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#include "interpreter.h" | |
#include "interpreter/interpreter.h" |
I believe we generally do includes from src. There is potential ambiguity if we also use local paths in the same dir.
|
||
// TODO: This is a quick and dirty hack. We should implement a proper iterator | ||
// in ir/iteration.h that keeps only a vector of (Expression*, index) pairs to | ||
// find the current location in the epxression tree. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// find the current location in the epxression tree. | |
// find the current location in the expression tree. |
|
||
// The list of remaining instructions in reverse order so we can pop from the | ||
// back to advance the iterator. | ||
std::vector<Expression*> exprs; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How will this work with loops etc.?
Generally, I was guessing this would use BinaryenIR itself. We could track the location of the frame pointer using a simple indexing, a vector of indexes basically,
[ index at toplevel, index at the higher level, index at a higher level still, ...]
So [1, 0]
would mean we are at the second instruction at the toplevel, and the first of its children, etc. If we keep pointers each of the children then we can quickly traverse between them etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, actually we have ChildIterator already. So a stack of ChildIterators could work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, that's what the TODO above the class is about. If for some reason we wanted to implement loops before implementing that better expression iterator, we would have to copy the vector-based iterator at each loop header and store it in a map on the frame. There are TODOs for those parts in Frame
and the ExpressionIterator
constructor. It's probably better to implement the better iterators first, though.
All other control flow transfers are forward edges, so they are very simple with the vector-based iterators. You just increment the iterator until you find the block matching the target.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use ChildIterator for this iteration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, a stack of ChildIterators would work if we don't mind the memory overhead of materializing the children of each expression in the current path. Alternatively we could do something more specialized that only stores an Expression*
and an Index
for each expression in the path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ChildIterator
contains SmallVector<Expression**, 4> children;
. In principle it could be reimplemented to contain only a single Expression*
and an index.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But then each step would need to find the children again? That seems likely to be slower.
I might think otherwise if memory caching issues were possible, like it we were storing large amounts of such vectors. But typically they would be quite small?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though, I am open to being convinced otherwise of course! And if that is faster, we can implement that in ChildIterator as an optimization?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it might not matter, and if it does, improving ChildIterator would make sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated the TODO to capture all of this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great start!
Building on top of #7227, i32.mul is implemented and tested.
Building on top of #7227, i32.sub is implemented and tested.
Building on top of #7227, i32.mul is implemented and tested.
Building on top of #7227, the following are implemented and tested: - f32.add - f32.sub - f32.mul - f32.div - f32.sqrt - f32.ceil - f32.floor - f32.trunc - f32.nearbyint
Building on top of #7227 , the following are implemented and tested: - f64.add - f64.sub - f64.mul - f64.div - f64.sqrt - f64.ceil - f64.floor - f64.trunc - f64.nearbyint
Building on top of #7227, the following are implemented and tested: - i64.add - i64.sub - i64.mul - i64.eq - i64.ltS - i64.ltU - i64.gtS - i64.gtU While here, I added in relevant cases that leveraged the code added for the above instructions. - i32.eq - f32.eq - f64.eq - i32.ltS - i32.ltU - i32.gtS - i32.ltU
Building on #7227, implemented and tested the following: - i32/i64.and - i32/i64.or - i32/i64.xor - i32/i64.shl - i32/i64.shrU - i32/i64.shrS - i32/i64.rotL - i32/i64.rotR
Building on #7227, implemented and tested the following: - i32/i64.and - i32/i64.or - i32/i64.xor - i32/i64.shl - i32/i64.shrU - i32/i64.shrS - i32/i64.rotL - i32/i64.rotR
The current interpreter used in wasm-shell, the fuzzer, and
optimizations like precompute works by recursively walking the
expression tree and computing expression results as it goes. This kind
of recursive interpretation is not going to work for stack switching,
since stack switching requires stashing context away and restoring it
later. The recursive interpreter stores intermediate values on the
native stack and returns early to implement control flow, so there is no
way to suspend a computation and resume it later.
To support stack switching and support other use future interpreter use
cases such as running the full spec test suite and fuzzing multithreaded
programs, introduce a new interpreter that is not recursive and does not
store intermediate state that needs to persist beyond the execution of a
single instruction on the native stack. The new interpreter works by
iterating through instructions and visiting them one at a time in a
loop. The visitor pushes and pops values from a stack and signals
control flow via its return values. Control flow transfers are handled
by the main interpreter loop, so expressions are only visited when they
are actually executed. This design will not only support stack switching
and other features better than the old interpreter, but it will also
significantly decrease the amount of code in the interpreter.
In addition to the core interpreter loop, also lay out a skeleton of the
execution context for the new interpreter, including a call stack and
store. The code contains several TODOs describing how these runtime
structures will need to be extended to support interpreting the full
spec test suite, including the ability to interpret over multiple linked
instances at once.
Most of the actual interpretation of expressions is left as future work,
but the interpretation of
Const
expressions and i32.add is implementedand tested in a new gtest file to demonstrate it working end-to-end. One
of the first milestones for the new interpreter will be getting real
spec tests running with it, at which point the gtest file can be
removed.