feat(benchmark): support tx gas limit cap in stateful benchmarks #1962

CPerezz · 2026-01-02T22:31:59Z

🗒️ Description

Update SLOAD/SSTORE and multi-opcode stateful benchmark tests to support Fusaka's 16M transaction gas limit. Tests now split large attack transactions into multiple smaller ones that each respect the tx_gas_limit cap while still filling the entire block gas budget.

Changes:

Added tx_gas_limit fixture parameter to all affected test functions in test_single_opcode.py and test_multi_opcode.py
Calculate num_txs = max(1, gas_benchmark_value // tx_gas_limit) to determine how many transactions are needed
Split single large attack transaction into multiple smaller transactions in a list
Updated gas calculations to be per-transaction rather than per-block

🔗 Related Issues or PRs

N/A.

✅ Checklist

All: PR title adheres to the repo standard - it will be used as the squash commit message and should start type(scope):.
Tests: Verified SLOAD and SSTORE benchmarks work with --transaction-gas-limit 16000000 and --gas-benchmark-values 34 on Anvil

Cute Animal Picture

Update SLOAD/SSTORE and multi-opcode stateful benchmark tests to support Fusaka's 16M transaction gas limit. Tests now split large attack transactions into multiple smaller ones that each respect the tx_gas_limit cap while still filling the entire block gas budget.

codecov · 2026-01-02T23:29:26Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 86.33%. Comparing base (d618e52) to head (186382d).
⚠️ Report is 33 commits behind head on forks/amsterdam.

Additional details and impacted files

@@               Coverage Diff                @@
##           forks/amsterdam    #1962   +/-   ##
================================================
  Coverage            86.33%   86.33%           
================================================
  Files                  538      538           
  Lines                34557    34557           
  Branches              3222     3222           
================================================
  Hits                 29835    29835           
  Misses                4148     4148           
  Partials               574      574

Flag	Coverage Δ
unittests	`86.33% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

jochem-brouwer · 2026-01-03T00:39:23Z

I think we should update these tests to use BenchmarkTest, as this supports the logic which I think this PR also adds. See:

execution-specs/packages/testing/src/execution_testing/specs/benchmark.py

Lines 393 to 400 in d618e52

    
           for i in range(num_splits): 
        
               split_tx = tx.model_copy() 
        
               split_tx.gas_limit = HexNumber( 
        
                   remaining_gas if i == num_splits - 1 else gas_limit_cap 
        
               ) 
        
               remaining_gas -= gas_limit_cap 
        
               split_tx.nonce = HexNumber(tx.nonce + i) 
        
               split_transactions.append(split_tx)

This works for tests if the transactions are just "copies" and there are no differences necessary per tx like changing calldata for instance (which in some cases we need to change as for instance calldata inputs to starting salts of the CREATE2 address calculator)

CC @LouisTsai-Csie

CPerezz · 2026-01-03T07:15:54Z

I think we should update these tests to use BenchmarkTest, as this supports the logic which I think this PR also adds. See:

execution-specs/packages/testing/src/execution_testing/specs/benchmark.py

Lines 393 to 400 in d618e52

for i in range(num_splits):

split_tx = tx.model_copy()

split_tx.gas_limit = HexNumber(

remaining_gas if i == num_splits - 1 else gas_limit_cap

)

remaining_gas -= gas_limit_cap

split_tx.nonce = HexNumber(tx.nonce + i)

split_transactions.append(split_tx)

This works for tests if the transactions are just "copies" and there are no differences necessary per tx like changing calldata for instance (which in some cases we need to change as for instance calldata inputs to starting salts of the CREATE2 address calculator)

CC @LouisTsai-Csie

EXTCODESIZE/EXTCODECOPY/EXTCODEHASH attacks: The attack bytecode reads num_deployed_contracts from factory storage and iterates from salt 0 to that value. If we split into multiple identical transactions, each transaction would:

Start at salt 0
Access the same contracts again
Those contracts would be warm on subsequent transactions (defeating the purpose of cold access benchmarks)

CALL attack: Same issue - iterates from salt 0 each time.
SLOAD/SSTORE attacks (in test_single_opcode.py): These call ERC20 balanceOf/approve on a fixed set of contracts. The same contracts get accessed each transaction instead of varying

So, to elaborate, BenchmarkTest.split_transaction cannot be used for these benchmarks because:

It creates identical transaction copies (same calldata)
Each copy would access the same addresses starting from salt 0
After the first transaction, those addresses are warm, not cold
This would fundamentally change what's being benchmarked

The current manual transaction splitting approach in test_multi_opcode.py has the same issue though - the transactions are also identical. To properly benchmark cold access across multiple transactions, the attack contracts would need to:

Accept a starting salt offset in calldata
Each transaction would pass a different offset

JIC, can you confirm this makes sense @LouisTsai-Csie ?

jochem-brouwer · 2026-01-03T07:33:17Z

Ah I see, yes I was giving this advice based on the diff, but as I now understand this has to be updated such that the attack contracts support a way to pick up the attack at a certain offset. Yes, for non-identical transactions the splitter cannot be used. However I think at some point we should include the calldata case where the transactions differ, since this is currently the go-to to make attacks start at a specific offset (so we could refactor later and add this feature to the transaction splitter, where txs have calldata based on some start offset and on an increase of this offset for each new tx).

danceratopz

Hey, @CPerezz! Could you please take a look at the comment on the overhead_per_contract in test_single_opcode? Once that's addressed, I think we can merge as-is and address any refactors in a follow-up PR.

tests/benchmark/stateful/bloatnet/test_single_opcode.py

tests/benchmark/stateful/bloatnet/test_multi_opcode.py

…nceof Adds per-contract overhead calculation to accurately account for loop setup/teardown costs in the gas budget calculation.

CPerezz · 2026-01-06T22:42:07Z

i think everything is addressed @danceratopz

Could you merge if everything is indeed ok?

danceratopz

Thanks for the update @CPerezz! Excuse me being pedantic, but I was going to suggest some clean-up and came across the inconsistency below.

tests/benchmark/stateful/bloatnet/test_multi_opcode.py

Replace arbitrary 1000 gas cleanup reservation with precise calculations for setup_overhead, cleanup_overhead, and loop_condition_overhead in test_multi_opcode.py tests. This aligns with the approach used in test_single_opcode.py.

Use correct EVM gas constants for opcode costs: - G_BASE (2) -> G_VERY_LOW (3) for DUP, ISZERO, MLOAD, MSTORE, SUB - G_MID (8) -> G_HIGH (10) for JUMPI - G_LOW (5) -> G_VERY_LOW (3) for MLOAD, MSTORE This fixes underestimation of loop_condition_overhead and overhead_per_contract calculations in test_multi_opcode.py and test_single_opcode.py.

danceratopz

Hi @CPerezz, I think some of the gas constants used in the calculations and comments are incorrect. This is a real pain - we need to absolutely try to remove as much of this accounting burden from test authors as possible. I thought it might be easier to PR my suggestions here:

CPerezz#1

Otherwise, could you please address the remaining missing contract overhead. Sorry that these have landed in drips and drabs!

tests/benchmark/stateful/bloatnet/test_multi_opcode.py

…saka-suggestions fix(test-benchmark): update gas constants in calculations and comments

Account for per-contract loop setup/teardown overhead: - SLOAD loop: MSTORE init + JUMPDEST + condition check (23 gas) - SSTORE loop: MSTORE selector + MSTORE init + JUMPDEST + condition (26 gas) - Total: 49 gas per contract This aligns with the approach used in test_sload_empty_erc20_balanceof and test_sstore_erc20_approve.

…saka-suggestions-2 fix(tests-benchmark): apply `overhead_per_contract` in `test_mixed_sload_sstore`; fix `JUMPI` gas

danceratopz

Hi @CPerezz, let's try and get closure on this PR! 👍

Looks like we both added commits to address the missing overhead_per_contract. That's the main remaining thing to address in the scope of this PR. I would suggest we remove the second one added in bf5724a, could you please check the suggestion below that removes it?

I don't want to be overcritical, but unfortunately there are still issues in the gas accounting for overhead cost estimation (some pre-date this PR). One obvious offender is the superfluous 100 gas for the non-existant STATICCALL base cost.

I'm not sure how precise these calculations need be for these tests, which target state growth (the attack is still valid). We should not address them in this PR, but if you think we should nail down exact costs, we could create a follow-up issue to review them. Perhaps you can pass comment on how important these costs are? A quick explanation in the next benchmarking call would be great. Thanks!

danceratopz · 2026-01-09T05:53:27Z

tests/benchmark/stateful/bloatnet/test_multi_opcode.py

+    # Per-contract fixed overhead (setup + teardown for each contract's loop)
+    overhead_per_contract = (
+        gas_costs.G_VERY_LOW  # MSTORE to initialize counter (3)
+        + gas_costs.G_JUMPDEST  # JUMPDEST at loop start (1)
+        + gas_costs.G_VERY_LOW  # MLOAD for While condition check (3)
+        + gas_costs.G_BASE  # ISZERO (2)
+        + gas_costs.G_BASE  # ISZERO (2)
+        + gas_costs.G_MID  # JUMPI (8)
+        + gas_costs.G_BASE  # POP to clean up at end (2)
+    )
+


Suggested change

# Per-contract fixed overhead (setup + teardown for each contract's loop)

overhead_per_contract = (

gas_costs.G_VERY_LOW # MSTORE to initialize counter (3)

+ gas_costs.G_JUMPDEST # JUMPDEST at loop start (1)

+ gas_costs.G_VERY_LOW # MLOAD for While condition check (3)

+ gas_costs.G_BASE # ISZERO (2)

+ gas_costs.G_BASE # ISZERO (2)

+ gas_costs.G_MID # JUMPI (8)

+ gas_costs.G_BASE # POP to clean up at end (2)

)

tests/benchmark/stateful/bloatnet/test_multi_opcode.py

danceratopz · 2026-01-12T09:45:21Z

@CPerezz just a follow-up comment and heads up about the gas accounting. @marioevz just proposed the following to help simplify gas cost calculations:

feat(testing/forks): Implement bytecode.gas_cost(fork) #2002

I would merge this PR with the suggestions above and then add an issue to refactor the gas accounting with #2002, once it's merged. Happy to help or even take the lead on that. How does that sound?

danceratopz

Thanks for adding this @CPerezz!

CPerezz changed the base branch from forks/osaka to forks/amsterdam January 2, 2026 22:33

CPerezz force-pushed the feat/update-stateful-benches-osaka branch from 035ff6c to 2762e3a Compare January 2, 2026 22:47

CPerezz mentioned this pull request Jan 3, 2026

Migrate benchmark test to Osaka compatibility ethpandaops/gas-lighting-tracker#19

Open

danceratopz reviewed Jan 6, 2026

View reviewed changes

tests/benchmark/stateful/bloatnet/test_single_opcode.py Outdated Show resolved Hide resolved

tests/benchmark/stateful/bloatnet/test_multi_opcode.py Show resolved Hide resolved

Carsons-Eels pushed a commit to Carsons-Eels/execution-specs that referenced this pull request Jan 6, 2026

chore(ci): bump to use latest eels for all. (ethereum#1962)

efb89f8

fix: add missing overhead_per_contract to test_sload_empty_erc20_bala…

2c8aded

…nceof Adds per-contract overhead calculation to accurately account for loop setup/teardown costs in the gas budget calculation.

CPerezz requested a review from danceratopz January 6, 2026 22:41

danceratopz reviewed Jan 7, 2026

View reviewed changes

tests/benchmark/stateful/bloatnet/test_multi_opcode.py Outdated Show resolved Hide resolved

CPerezz requested a review from danceratopz January 7, 2026 14:38

danceratopz self-assigned this Jan 7, 2026

CPerezz mentioned this pull request Jan 7, 2026

feat(benchmark): add EXTCODESIZE bytecode size benchmark for cold access testing #1961

Merged

8 tasks

danceratopz reviewed Jan 8, 2026

View reviewed changes

tests/benchmark/stateful/bloatnet/test_multi_opcode.py Outdated Show resolved Hide resolved

Merge pull request #1 from danceratopz/feat/update-stateful-benches-o…

63e7322

…saka-suggestions fix(test-benchmark): update gas constants in calculations and comments

CPerezz requested a review from danceratopz January 8, 2026 10:38

danceratopz added 2 commits January 8, 2026 12:02

fix(tests-benchmark): use G_HIGH instead of G_MID for JUMPI.

4372aca

danceratopz mentioned this pull request Jan 8, 2026

Automate cold-access transaction splitting for stateful benchmarks #1991

Open

CPerezz and others added 2 commits January 8, 2026 16:20

fix: add missing overhead_per_contract to test_mixed_sload_sstore

bf5724a

Merge pull request #2 from danceratopz/feat/update-stateful-benches-o…

bc026bb

…saka-suggestions-2 fix(tests-benchmark): apply `overhead_per_contract` in `test_mixed_sload_sstore`; fix `JUMPI` gas

danceratopz reviewed Jan 9, 2026

View reviewed changes

fix: remove superfluous STATICCALL base cost and fix JUMPI gas constant

5b30959

fix: remove duplicate overhead_per_contract in test_mixed_sload_sstore

186382d

danceratopz approved these changes Jan 14, 2026

View reviewed changes

danceratopz merged commit 2c83b84 into ethereum:forks/amsterdam Jan 14, 2026
15 checks passed

LouisTsai-Csie mentioned this pull request Jan 22, 2026

refactor(testing): Implement IteratingBytecode, FixedIterationsBytecode #2030

Open

8 tasks

feat(benchmark): support tx gas limit cap in stateful benchmarks #1962

feat(benchmark): support tx gas limit cap in stateful benchmarks #1962

Uh oh!

Conversation

CPerezz commented Jan 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🗒️ Description

🔗 Related Issues or PRs

✅ Checklist

Cute Animal Picture

Uh oh!

codecov bot commented Jan 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

jochem-brouwer commented Jan 3, 2026

Uh oh!

CPerezz commented Jan 3, 2026

Uh oh!

jochem-brouwer commented Jan 3, 2026

Uh oh!

danceratopz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

CPerezz commented Jan 6, 2026

Uh oh!

danceratopz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

danceratopz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

danceratopz left a comment

Choose a reason for hiding this comment

Uh oh!

danceratopz Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

danceratopz commented Jan 12, 2026

Uh oh!

danceratopz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

CPerezz commented Jan 2, 2026 •

edited

Loading

codecov bot commented Jan 2, 2026 •

edited

Loading