-
Notifications
You must be signed in to change notification settings - Fork 285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
baseline: Track stack evolution separately #418
Conversation
Benchmark together with #416.
|
4b4cdbc
to
9a03aeb
Compare
@@ -38,17 +39,19 @@ class Tracer | |||
} | |||
|
|||
void notify_instruction_start( // NOLINT(misc-no-recursion) | |||
uint32_t pc, const ExecutionState& state) noexcept | |||
uint32_t pc, intx::uint256* stack_top, int stack_height, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The parameters are not beautiful here. The best option seems to be std::span<const uint256>
, but we wait for C++20.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Silkworm we use gsl::span
from https://github.com/microsoft/GSL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason not using std::span from C++20?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because Silkworm is compatible with C++17.
Codecov Report
@@ Coverage Diff @@
## master #418 +/- ##
=======================================
Coverage 99.62% 99.62%
=======================================
Files 31 31
Lines 4013 4017 +4
=======================================
+ Hits 3998 4002 +4
Misses 15 15
Flags with carried forward coverage won't be shown. Click here to find out more.
|
Please rebase. |
9a03aeb
to
b3b4aa4
Compare
lib/evmone/baseline.cpp
Outdated
@@ -193,17 +211,22 @@ evmc_result execute(const VM& vm, ExecutionState& state, const CodeAnalysis& ana | |||
const auto& cost_table = get_baseline_cost_table(state.rev); | |||
|
|||
const auto* const code = state.code.data(); | |||
auto code_it = code; // Code iterator for the interpreter loop. | |||
while (true) // Guaranteed to terminate because padded code ends with STOP. | |||
const auto stack_bottom = state.stack.top_item; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this not I got it, initially top points to the bottom, I was confused
const auto stack_bottom = state.stack.top_item; | |
const auto stack_top = state.stack.top_item; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's kind of confusing that now we there exist state.stack.top_item
and position.stack_top
and only one of them is valid.
I guess ExecutionState::Stack
should be removed soon and replaced with just a pair of stack storage and "initial_stack_top" pointer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is because Baseline leaves .top_item
untouched but Advanced is manipulating it (as Baseline did before).
We could more Stack to AdvancedExecutionState, but this would allocate it after the stack storage. We can also migrate Advanced to StackTop
. All to be decided later and with low priority.
So far I used state.stack.storage
here instead.
It looks fine overall, but intuition why this helps optimizations better is a bit beyond me... |
b3b4aa4
to
08b2a5a
Compare
This is actually easy to explain. This is stack check for --- old.asm 2022-01-26 18:48:34.751835959 +0100
+++ new.asm 2022-01-26 18:48:48.287972455 +0100
@@ -1,10 +1,8 @@
#APP
# OP_ADD
#NO_APP
- movq (%r13), %rax ; %rax has current stack top
- movq %rax, %rcx
- subq %r15, %rcx
- addq $32, %rcx
- shrq $5, %rcx
- cmpl $2, %ecx
- jl .LBB3_690
+ movq %rbx, %rax ; %rbx has current stack top
+ subq %r14, %rax
+ shrq $5, %rax
+ cmpl $2, %eax
+ jl .LBB3_668 If we keep the stack top pointer in The same can be done for We can also further optimize stack height computation, see #420. Currently this is subq %r14, %rax ; stack_top - stack_bottom
shrq $5, %rax ; convert to sizeof(uint256) |
08b2a5a
to
af8116b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine as it is, but I'd vote to rename Stack::end()
-> Stack::bottom()
af8116b
to
a78f6bd
Compare
Track EVM stack top item outside of State object. This make the stack top pointer always visible to the compiler and it is able to better optimize the interpreter.
a78f6bd
to
0574645
Compare
Hmm. It looks like this change overflows 16MB stack in the Debug mode: https://app.circleci.com/pipelines/github/torquem-ch/silkworm/3728/workflows/2da56982-dbc3-4897-90dd-490b40278d8b/jobs/10490 |
Track EVM stack top item outside of State object. This make the stack
top pointer always visible to the compiler and it is able to better
optimize the interpreter.