Skip to content

Conversation

@CPerezz
Copy link
Contributor

@CPerezz CPerezz commented Jan 2, 2026

🗒️ Description

Update SLOAD/SSTORE and multi-opcode stateful benchmark tests to support Fusaka's 16M transaction gas limit. Tests now split large attack transactions into multiple smaller ones that each respect the tx_gas_limit cap while still filling the entire block gas budget.

Changes:

  • Added tx_gas_limit fixture parameter to all affected test functions in test_single_opcode.py and test_multi_opcode.py
  • Calculate num_txs = max(1, gas_benchmark_value // tx_gas_limit) to determine how many transactions are needed
  • Split single large attack transaction into multiple smaller transactions in a list
  • Updated gas calculations to be per-transaction rather than per-block

🔗 Related Issues or PRs

N/A.

✅ Checklist

  • All: PR title adheres to the repo standard - it will be used as the squash commit message and should start type(scope):.
  • Tests: Verified SLOAD and SSTORE benchmarks work with --transaction-gas-limit 16000000 and --gas-benchmark-values 34 on Anvil

Cute Animal Picture

image

@CPerezz CPerezz changed the base branch from forks/osaka to forks/amsterdam January 2, 2026 22:33
Update SLOAD/SSTORE and multi-opcode stateful benchmark tests to
support Fusaka's 16M transaction gas limit. Tests now split large
attack transactions into multiple smaller ones that each respect the
tx_gas_limit cap while still filling the entire block gas budget.
@CPerezz CPerezz force-pushed the feat/update-stateful-benches-osaka branch from 035ff6c to 2762e3a Compare January 2, 2026 22:47
@codecov
Copy link

codecov bot commented Jan 2, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 86.33%. Comparing base (d618e52) to head (186382d).
⚠️ Report is 33 commits behind head on forks/amsterdam.

Additional details and impacted files
@@               Coverage Diff                @@
##           forks/amsterdam    #1962   +/-   ##
================================================
  Coverage            86.33%   86.33%           
================================================
  Files                  538      538           
  Lines                34557    34557           
  Branches              3222     3222           
================================================
  Hits                 29835    29835           
  Misses                4148     4148           
  Partials               574      574           
Flag Coverage Δ
unittests 86.33% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@jochem-brouwer
Copy link
Member

I think we should update these tests to use BenchmarkTest, as this supports the logic which I think this PR also adds. See:

for i in range(num_splits):
split_tx = tx.model_copy()
split_tx.gas_limit = HexNumber(
remaining_gas if i == num_splits - 1 else gas_limit_cap
)
remaining_gas -= gas_limit_cap
split_tx.nonce = HexNumber(tx.nonce + i)
split_transactions.append(split_tx)

This works for tests if the transactions are just "copies" and there are no differences necessary per tx like changing calldata for instance (which in some cases we need to change as for instance calldata inputs to starting salts of the CREATE2 address calculator)

CC @LouisTsai-Csie

@CPerezz
Copy link
Contributor Author

CPerezz commented Jan 3, 2026

I think we should update these tests to use BenchmarkTest, as this supports the logic which I think this PR also adds. See:

for i in range(num_splits):
split_tx = tx.model_copy()
split_tx.gas_limit = HexNumber(
remaining_gas if i == num_splits - 1 else gas_limit_cap
)
remaining_gas -= gas_limit_cap
split_tx.nonce = HexNumber(tx.nonce + i)
split_transactions.append(split_tx)

This works for tests if the transactions are just "copies" and there are no differences necessary per tx like changing calldata for instance (which in some cases we need to change as for instance calldata inputs to starting salts of the CREATE2 address calculator)

CC @LouisTsai-Csie

  • EXTCODESIZE/EXTCODECOPY/EXTCODEHASH attacks: The attack bytecode reads num_deployed_contracts from factory storage and iterates from salt 0 to that value. If we split into multiple identical transactions, each transaction would:
  1. Start at salt 0
  2. Access the same contracts again
  3. Those contracts would be warm on subsequent transactions (defeating the purpose of cold access benchmarks)
  • CALL attack: Same issue - iterates from salt 0 each time.
  • SLOAD/SSTORE attacks (in test_single_opcode.py): These call ERC20 balanceOf/approve on a fixed set of contracts. The same contracts get accessed each transaction instead of varying

So, to elaborate, BenchmarkTest.split_transaction cannot be used for these benchmarks because:

  • It creates identical transaction copies (same calldata)
  • Each copy would access the same addresses starting from salt 0
  • After the first transaction, those addresses are warm, not cold
  • This would fundamentally change what's being benchmarked

The current manual transaction splitting approach in test_multi_opcode.py has the same issue though - the transactions are also identical. To properly benchmark cold access across multiple transactions, the attack contracts would need to:

  • Accept a starting salt offset in calldata
  • Each transaction would pass a different offset

JIC, can you confirm this makes sense @LouisTsai-Csie ?

@jochem-brouwer
Copy link
Member

Ah I see, yes I was giving this advice based on the diff, but as I now understand this has to be updated such that the attack contracts support a way to pick up the attack at a certain offset. Yes, for non-identical transactions the splitter cannot be used. However I think at some point we should include the calldata case where the transactions differ, since this is currently the go-to to make attacks start at a specific offset (so we could refactor later and add this feature to the transaction splitter, where txs have calldata based on some start offset and on an increase of this offset for each new tx).

Copy link
Member

@danceratopz danceratopz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, @CPerezz! Could you please take a look at the comment on the overhead_per_contract in test_single_opcode? Once that's addressed, I think we can merge as-is and address any refactors in a follow-up PR.

Carsons-Eels pushed a commit to Carsons-Eels/execution-specs that referenced this pull request Jan 6, 2026
…nceof

Adds per-contract overhead calculation to accurately account for loop
setup/teardown costs in the gas budget calculation.
@CPerezz CPerezz requested a review from danceratopz January 6, 2026 22:41
@CPerezz
Copy link
Contributor Author

CPerezz commented Jan 6, 2026

i think everything is addressed @danceratopz

Could you merge if everything is indeed ok?

Copy link
Member

@danceratopz danceratopz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update @CPerezz! Excuse me being pedantic, but I was going to suggest some clean-up and came across the inconsistency below.

Replace arbitrary 1000 gas cleanup reservation with precise
calculations for setup_overhead, cleanup_overhead, and
loop_condition_overhead in test_multi_opcode.py tests.

This aligns with the approach used in test_single_opcode.py.
Use correct EVM gas constants for opcode costs:
- G_BASE (2) -> G_VERY_LOW (3) for DUP, ISZERO, MLOAD, MSTORE, SUB
- G_MID (8) -> G_HIGH (10) for JUMPI
- G_LOW (5) -> G_VERY_LOW (3) for MLOAD, MSTORE

This fixes underestimation of loop_condition_overhead and
overhead_per_contract calculations in test_multi_opcode.py and
test_single_opcode.py.
Copy link
Member

@danceratopz danceratopz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @CPerezz, I think some of the gas constants used in the calculations and comments are incorrect. This is a real pain - we need to absolutely try to remove as much of this accounting burden from test authors as possible. I thought it might be easier to PR my suggestions here:

Otherwise, could you please address the remaining missing contract overhead. Sorry that these have landed in drips and drabs!

…saka-suggestions

fix(test-benchmark): update gas constants in calculations and comments
@CPerezz CPerezz requested a review from danceratopz January 8, 2026 10:38
Account for per-contract loop setup/teardown overhead:
- SLOAD loop: MSTORE init + JUMPDEST + condition check (23 gas)
- SSTORE loop: MSTORE selector + MSTORE init + JUMPDEST + condition (26 gas)
- Total: 49 gas per contract

This aligns with the approach used in test_sload_empty_erc20_balanceof
and test_sstore_erc20_approve.
CPerezz and others added 2 commits January 8, 2026 16:20
…saka-suggestions-2

fix(tests-benchmark): apply `overhead_per_contract` in `test_mixed_sload_sstore`; fix `JUMPI` gas
Copy link
Member

@danceratopz danceratopz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @CPerezz, let's try and get closure on this PR! 👍

Looks like we both added commits to address the missing overhead_per_contract. That's the main remaining thing to address in the scope of this PR. I would suggest we remove the second one added in bf5724a, could you please check the suggestion below that removes it?

I don't want to be overcritical, but unfortunately there are still issues in the gas accounting for overhead cost estimation (some pre-date this PR). One obvious offender is the superfluous 100 gas for the non-existant STATICCALL base cost.

I'm not sure how precise these calculations need be for these tests, which target state growth (the attack is still valid). We should not address them in this PR, but if you think we should nail down exact costs, we could create a follow-up issue to review them. Perhaps you can pass comment on how important these costs are? A quick explanation in the next benchmarking call would be great. Thanks!

Comment on lines 770 to 780
# Per-contract fixed overhead (setup + teardown for each contract's loop)
overhead_per_contract = (
gas_costs.G_VERY_LOW # MSTORE to initialize counter (3)
+ gas_costs.G_JUMPDEST # JUMPDEST at loop start (1)
+ gas_costs.G_VERY_LOW # MLOAD for While condition check (3)
+ gas_costs.G_BASE # ISZERO (2)
+ gas_costs.G_BASE # ISZERO (2)
+ gas_costs.G_MID # JUMPI (8)
+ gas_costs.G_BASE # POP to clean up at end (2)
)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Per-contract fixed overhead (setup + teardown for each contract's loop)
overhead_per_contract = (
gas_costs.G_VERY_LOW # MSTORE to initialize counter (3)
+ gas_costs.G_JUMPDEST # JUMPDEST at loop start (1)
+ gas_costs.G_VERY_LOW # MLOAD for While condition check (3)
+ gas_costs.G_BASE # ISZERO (2)
+ gas_costs.G_BASE # ISZERO (2)
+ gas_costs.G_MID # JUMPI (8)
+ gas_costs.G_BASE # POP to clean up at end (2)
)

@danceratopz
Copy link
Member

@CPerezz just a follow-up comment and heads up about the gas accounting. @marioevz just proposed the following to help simplify gas cost calculations:

I would merge this PR with the suggestions above and then add an issue to refactor the gas accounting with #2002, once it's merged. Happy to help or even take the lead on that. How does that sound?

Copy link
Member

@danceratopz danceratopz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this @CPerezz!

@danceratopz danceratopz merged commit 2c83b84 into ethereum:forks/amsterdam Jan 14, 2026
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants