-
Notifications
You must be signed in to change notification settings - Fork 414
feat(benchmark): support tx gas limit cap in stateful benchmarks #1962
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(benchmark): support tx gas limit cap in stateful benchmarks #1962
Conversation
Update SLOAD/SSTORE and multi-opcode stateful benchmark tests to support Fusaka's 16M transaction gas limit. Tests now split large attack transactions into multiple smaller ones that each respect the tx_gas_limit cap while still filling the entire block gas budget.
035ff6c to
2762e3a
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## forks/amsterdam #1962 +/- ##
================================================
Coverage 86.33% 86.33%
================================================
Files 538 538
Lines 34557 34557
Branches 3222 3222
================================================
Hits 29835 29835
Misses 4148 4148
Partials 574 574
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
I think we should update these tests to use execution-specs/packages/testing/src/execution_testing/specs/benchmark.py Lines 393 to 400 in d618e52
This works for tests if the transactions are just "copies" and there are no differences necessary per tx like changing calldata for instance (which in some cases we need to change as for instance calldata inputs to starting salts of the CREATE2 address calculator) |
So, to elaborate,
The current manual transaction splitting approach in
JIC, can you confirm this makes sense @LouisTsai-Csie ? |
|
Ah I see, yes I was giving this advice based on the diff, but as I now understand this has to be updated such that the attack contracts support a way to pick up the attack at a certain offset. Yes, for non-identical transactions the splitter cannot be used. However I think at some point we should include the calldata case where the transactions differ, since this is currently the go-to to make attacks start at a specific offset (so we could refactor later and add this feature to the transaction splitter, where txs have calldata based on some start offset and on an increase of this offset for each new tx). |
danceratopz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, @CPerezz! Could you please take a look at the comment on the overhead_per_contract in test_single_opcode? Once that's addressed, I think we can merge as-is and address any refactors in a follow-up PR.
…nceof Adds per-contract overhead calculation to accurately account for loop setup/teardown costs in the gas budget calculation.
|
i think everything is addressed @danceratopz Could you merge if everything is indeed ok? |
danceratopz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update @CPerezz! Excuse me being pedantic, but I was going to suggest some clean-up and came across the inconsistency below.
Replace arbitrary 1000 gas cleanup reservation with precise calculations for setup_overhead, cleanup_overhead, and loop_condition_overhead in test_multi_opcode.py tests. This aligns with the approach used in test_single_opcode.py.
Use correct EVM gas constants for opcode costs: - G_BASE (2) -> G_VERY_LOW (3) for DUP, ISZERO, MLOAD, MSTORE, SUB - G_MID (8) -> G_HIGH (10) for JUMPI - G_LOW (5) -> G_VERY_LOW (3) for MLOAD, MSTORE This fixes underestimation of loop_condition_overhead and overhead_per_contract calculations in test_multi_opcode.py and test_single_opcode.py.
danceratopz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @CPerezz, I think some of the gas constants used in the calculations and comments are incorrect. This is a real pain - we need to absolutely try to remove as much of this accounting burden from test authors as possible. I thought it might be easier to PR my suggestions here:
Otherwise, could you please address the remaining missing contract overhead. Sorry that these have landed in drips and drabs!
…saka-suggestions fix(test-benchmark): update gas constants in calculations and comments
Account for per-contract loop setup/teardown overhead: - SLOAD loop: MSTORE init + JUMPDEST + condition check (23 gas) - SSTORE loop: MSTORE selector + MSTORE init + JUMPDEST + condition (26 gas) - Total: 49 gas per contract This aligns with the approach used in test_sload_empty_erc20_balanceof and test_sstore_erc20_approve.
…saka-suggestions-2 fix(tests-benchmark): apply `overhead_per_contract` in `test_mixed_sload_sstore`; fix `JUMPI` gas
danceratopz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @CPerezz, let's try and get closure on this PR! 👍
Looks like we both added commits to address the missing overhead_per_contract. That's the main remaining thing to address in the scope of this PR. I would suggest we remove the second one added in bf5724a, could you please check the suggestion below that removes it?
I don't want to be overcritical, but unfortunately there are still issues in the gas accounting for overhead cost estimation (some pre-date this PR). One obvious offender is the superfluous 100 gas for the non-existant STATICCALL base cost.
I'm not sure how precise these calculations need be for these tests, which target state growth (the attack is still valid). We should not address them in this PR, but if you think we should nail down exact costs, we could create a follow-up issue to review them. Perhaps you can pass comment on how important these costs are? A quick explanation in the next benchmarking call would be great. Thanks!
| # Per-contract fixed overhead (setup + teardown for each contract's loop) | ||
| overhead_per_contract = ( | ||
| gas_costs.G_VERY_LOW # MSTORE to initialize counter (3) | ||
| + gas_costs.G_JUMPDEST # JUMPDEST at loop start (1) | ||
| + gas_costs.G_VERY_LOW # MLOAD for While condition check (3) | ||
| + gas_costs.G_BASE # ISZERO (2) | ||
| + gas_costs.G_BASE # ISZERO (2) | ||
| + gas_costs.G_MID # JUMPI (8) | ||
| + gas_costs.G_BASE # POP to clean up at end (2) | ||
| ) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| # Per-contract fixed overhead (setup + teardown for each contract's loop) | |
| overhead_per_contract = ( | |
| gas_costs.G_VERY_LOW # MSTORE to initialize counter (3) | |
| + gas_costs.G_JUMPDEST # JUMPDEST at loop start (1) | |
| + gas_costs.G_VERY_LOW # MLOAD for While condition check (3) | |
| + gas_costs.G_BASE # ISZERO (2) | |
| + gas_costs.G_BASE # ISZERO (2) | |
| + gas_costs.G_MID # JUMPI (8) | |
| + gas_costs.G_BASE # POP to clean up at end (2) | |
| ) |
|
@CPerezz just a follow-up comment and heads up about the gas accounting. @marioevz just proposed the following to help simplify gas cost calculations: I would merge this PR with the suggestions above and then add an issue to refactor the gas accounting with #2002, once it's merged. Happy to help or even take the lead on that. How does that sound? |
danceratopz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this @CPerezz!
🗒️ Description
Update SLOAD/SSTORE and multi-opcode stateful benchmark tests to support Fusaka's 16M transaction gas limit. Tests now split large attack transactions into multiple smaller ones that each respect the
tx_gas_limitcap while still filling the entire block gas budget.Changes:
tx_gas_limitfixture parameter to all affected test functions intest_single_opcode.pyandtest_multi_opcode.pynum_txs = max(1, gas_benchmark_value // tx_gas_limit)to determine how many transactions are needed🔗 Related Issues or PRs
N/A.
✅ Checklist
type(scope):.--transaction-gas-limit 16000000and--gas-benchmark-values 34on AnvilCute Animal Picture