refactor(benchmark): update to benchmark test wrapper #2160

LouisTsai-Csie · 2025-09-16T10:53:14Z

🗒️ Description

Enhanced Interface

In the original benchmark helpers, the repeated code pattern is structured as:

<setup><JUMPDEST><attack><attack>...<attack><JUMP>

This PR adds a new cleanup section before each iteration, and the interface change accordingly:

<setup><JUMPDEST><attack><attack>...<attack><cleanup><JUMP>

Refactoring target

Updating to benchmark wrapper

Update to benchmark test wrapper to handle eip7825 scenario
Use JumpLoopGenerator or ExtCodeGenerator for repeated pattern.

Unifying the variable names

Use setup for the code pattern before the bytecode loop (instead of code_prefix, calldata).
Use attack_block for the actual benchmark execution pattern (instead of code_loop_body, attack_iter, code_sequence, loop_body, code_segment, op_sequence, or iter_block).
Use cleanup for the phase after the execution loop (instead of code_suffix, code_loop_footer).
Use gas_benchmark_value for the benchmark gas limit (instead of attack_gas_limit).
Use tx_gas_limit for the transaction gas limit cap (instead of tx_gas_limit_cap) -> this should be avoided as the wrapper automatically handle such scenario.

Removing redundant parameters

Remove gas_benchmark_value as it is configured automatically
Remove fork, env param if not used in function

Implementation Note

Most of the test logic remains the same, but the following cases differ and may be worth a closer look from reviewers.

test_worst_returndatasize_zero
test_worst_keccak
test_worst_binop_simple
test_worst_unop
test_worst_calldatacopy
test_block_full_data
test_worst_blockhash

Some test cases have not been refactored, as they would require significantly more effort and may even need to be rewritten. I suggest skipping these for now and updating them in a separate PR:

test_worst_bytecode_single_opcode

Additionally, some test cases may fail on the Osaka fork:

test_worst_address_state_cold
test_worst_selfdestruct_existing
test_worst_selfdestruct_created
test_worst_selfdestruct_initcode

🔗 Related Issues or PRs

Requires PR #1956 and PR #1945
Relevant discussion: issue ethereum/execution-specs#1557

✅ Checklist

All: Ran fast tox checks to avoid unnecessary CI fails, see also Code Standards and Enabling Pre-commit Checks:
```
uvx --with=tox-uv tox -e lint,typecheck,spellcheck,markdownlint
```
All: PR title adheres to the repo standard - it will be used as the squash commit message and should start type(scope):.
All: Considered adding an entry to CHANGELOG.md.
All: Considered updating the online docs in the ./docs/ directory.
All: Set appropriate labels for the changes (only maintainers can apply labels).
Tests: Ran mkdocs serve locally and verified the auto-generated docs for new tests in the Test Case Reference are correctly formatted.
Tests: For PRs implementing a missed test case, update the post-mortem document to add an entry the list.
Ported Tests: All converted JSON/YML tests from ethereum/tests or tests/static have been assigned @ported_from marker.

LouisTsai-Csie · 2025-09-29T09:42:39Z

tests/benchmark/test_worst_compute.py

        pre=pre,
        post={},
-        tx=tx,
+        code_generator=JumpLoopGenerator(setup=setup, attack_block=Op.POP(Op.RETURNDATASIZE)),
    )


 def test_worst_returndatasize_zero(


For test_worst_returndatasize_zero case, the test implementation has been changed.

In the original version, the logic followed a POP & PUSH pattern. This can be refactored so that contract A only calls the RETURNDATASIZE operation until reaching the max stack size, while contract B repeatedly performs STATICCALLs to contract A.

In this refactored approach, we can use ExtCallGenerator as a helper.

Related issue https://github.com/ethereum/execution-spec-tests/issues/1968

LouisTsai-Csie · 2025-09-29T09:47:32Z

tests/benchmark/test_worst_compute.py


-    state_test(
+    benchmark_test(
        pre=pre,
        post={},
        tx=tx,
    )


 def test_worst_keccak(


For test_worst_keccak test, the implementation has been changed.

The original pattern:

JUMPDEST PUSH20 length loopcode loopcode loopcode POP PUSH0 JUMP

This could be refactored to the following to avoid additional POP operations.

PUSH20 length JUMPDEST loopcode loopcode loopcode PUSH0 JUMP

And the second pattern follows the JumpLoopGenerator pattern

It's a bit weird then that the opcode count seems to be intact according to the opcode count file.

LouisTsai-Csie · 2025-09-29T09:59:01Z

tests/benchmark/test_worst_compute.py

@@ -2083,7 +1443,7 @@ def test_worst_jumpdests(
    ids=lambda param: "" if isinstance(param, tuple) else param,
 )
 def test_worst_binop_simple(


For test_worst_binop_simple case, the implementation has been changed.

TBD

LouisTsai-Csie · 2025-09-29T10:03:15Z

tests/benchmark/test_worst_compute.py

@@ -2122,35 +1474,21 @@ def test_worst_binop_simple(

 @pytest.mark.parametrize("opcode", [Op.ISZERO, Op.NOT])
 def test_worst_unop(


For test_worst_unop, the test case have been changed in the implementation.

Original pattern:

JUMPDEST PUSH0 loopcode loopcode ... loopcode POP PUSH0 JUMP

It is changed to the following and updated to JumpLoopGenerator helper:

PUSH0 JUMPDEST loopcode loopcode ... loopcode POP JUMP

LouisTsai-Csie · 2025-09-29T10:05:32Z

tests/benchmark/test_worst_blocks.py


-    state_test(
+@pytest.mark.parametrize("zero_byte", [True, False])
+def test_block_full_data(


For test_block_full_data, its implementation has been updated to apply eip7825 transaction gas limit cap.

LouisTsai-Csie · 2025-09-29T10:06:50Z

tests/benchmark/test_worst_stateful_opcodes.py

    )


 def test_worst_blockhash(


This implementation has been updated for test_worst_blockhash.

Optimize the case using the ExtCallGenerator and upgrade to apply eip7825 transaction gas limit cap.

LouisTsai-Csie · 2025-10-10T15:40:00Z

The change in the PR might be large, but it could be split into smaller parts:

benchmark_code_generator.py & benchmark.py: These two cases update the fundamental helper function.

Below are files that contains smaller amount of test cases, it would be nice if someone could first review some of them, so that i could apply initial feedback to the remaining cases.

This is the largest one, but most of the changes are identical, i already documented the tests that contains logic changes above.

test_worst_compute (32)

LouisTsai-Csie · 2025-10-10T16:19:13Z

The opcode count comparison result: https://gist.github.com/LouisTsai-Csie/74189c969201871d641a785ab9966090

marioevz · 2025-10-10T22:41:28Z

Pushed a commit that changes the following:

Makes deploy_contracts and generate_transaction in BenchmarkCodeGenerator require keyword arguments.
Added a tx_kwargs field to BenchmarkCodeGenerator that can be used to add extra values to the benchmark transaction (like its data, or value, blobs, etc). This removed most of the generate_transaction usages in tests, so now only BenchmarkTest calls this function internally, and its usage is mostly abstracted to the user. See before and after.
Removed having to pass pre or post to benchmark_test so the usages are cleaner throughout the tests.
Added defaults to methods that have setup and cleanup to be empty bytecode (Bytecode()) so we don't have to always pass it.

marioevz

LGTM.

I left only two comments but it's fine by me if we instead create issues for them.

marioevz · 2025-10-11T00:02:27Z

src/ethereum_test_benchmark/benchmark_code_generator.py

        )

        # Deploy target contract that contains the actual attack block
        self._target_contract_address = pre.deploy_contract(


My comment is actually on the max_iterations line: I think the fork.max_stack_height part of the min only holds if the attack_block pushes exactly one item to the stack, if it pushes zero items to the stack we can keep going, and if pushes more than one item to the stack we will overflow it.

Bytecode actually has properties that could help make this more fail-safe and are automatically calculated:

execution-spec-tests/src/ethereum_test_vm/bytecode.py

Lines 35 to 38 in a9fc9ee

popped_stack_items: int

pushed_stack_items: int

max_stack_height: int

min_stack_height: int

marioevz · 2025-10-11T00:07:45Z

tests/benchmark/test_worst_stateful_opcodes.py

-    # Always ask for the oldest allowed BLOCKHASH block.
-    execution_code = Op.PUSH1(1) + While(
-        body=Op.POP(Op.BLOCKHASH(Op.DUP1)),
+    code = ExtCallGenerator(attack_block=Op.BLOCKHASH(1)).generate_repeated_code(


I think we could add a setup_blocks field to BenchmarkTest which could be appended to the beginning of the blockchain and then BenchmarkCodeGenerator only inserts the last block.

We should avoid these workarounds that don't use the code_generator field (of BenchmarkTest) as much as possible.

marioevz · 2025-10-11T00:09:06Z

tests/benchmark/test_worst_compute.py


-    state_test(
+    benchmark_test(
        pre=pre,
        post={},
        tx=tx,
    )


 def test_worst_keccak(


It's a bit weird then that the opcode count seems to be intact according to the opcode count file.

LouisTsai-Csie self-assigned this Sep 16, 2025

LouisTsai-Csie changed the title ~~Refactor benchmark tests~~ refactor(benchmark): update to benchmark test wrapper Sep 16, 2025

LouisTsai-Csie mentioned this pull request Sep 17, 2025

feat(benchmark): add benchmark_test test type #1945

Merged

5 tasks

LouisTsai-Csie force-pushed the refactor-benchmark-tests branch 2 times, most recently from eb1fc44 to 619f1cf Compare September 19, 2025 07:11

LouisTsai-Csie force-pushed the refactor-benchmark-tests branch from 41bc03e to cc03678 Compare September 25, 2025 06:08

LouisTsai-Csie marked this pull request as ready for review September 25, 2025 08:48

LouisTsai-Csie force-pushed the refactor-benchmark-tests branch 2 times, most recently from beb239d to b51eba1 Compare September 29, 2025 09:24

LouisTsai-Csie commented Sep 29, 2025

View reviewed changes

danceratopz mentioned this pull request Sep 29, 2025

All Core Devs - Testing (ACDT) #55, Sep 29, 2025 ethereum/pm#1736

Closed

3 tasks

LouisTsai-Csie added feature:benchmark type:refactor Type: Refactor labels Sep 30, 2025

spencer-tb mentioned this pull request Oct 9, 2025

tracker(fork): osaka fork mega meta issue ethereum/execution-specs#1558

Open

30 tasks

LouisTsai-Csie added 11 commits October 10, 2025 12:09

refactor(benchmark): update code generator interface

81be606

refactor(benchmark): update worst bytecode scenario

4ac3005

refactor(benchmark): update worst compute scenario

8502e69

refactor(benchmark): update worst memory scenario

22c0712

refactor(benchmark): update worst opcode scenario

9f94d47

refactor(benchmark): update worst statefule scenario

9f50834

refactor(benchmark): update modexp cases by parameterization

4664c31

refactor: remove blockchain filler in worst block cases

cbcac72

refactor: update blockhash case

50916a1

fix: resolve failing blob and block has tests

bb20897

fix: update linting comment length

41f0caa

fix linting and typing issue

50b6b30

LouisTsai-Csie force-pushed the refactor-benchmark-tests branch from b51eba1 to 50b6b30 Compare October 10, 2025 04:32

fix incorrect logic for missing coverage

12aceb8

refactor unop worst case

d77d3a4

LouisTsai-Csie force-pushed the refactor-benchmark-tests branch from 5d5ffda to d77d3a4 Compare October 10, 2025 16:14

refactor(tests/benchmark): Optimize generators usages

e2f2024

marioevz approved these changes Oct 11, 2025

View reviewed changes

		@@ -2122,35 +1474,21 @@ def test_worst_binop_simple(

		@pytest.mark.parametrize("opcode", [Op.ISZERO, Op.NOT])
		def test_worst_unop(

	popped_stack_items: int
	pushed_stack_items: int
	max_stack_height: int
	min_stack_height: int

refactor(benchmark): update to benchmark test wrapper #2160

Are you sure you want to change the base?

refactor(benchmark): update to benchmark test wrapper #2160

Uh oh!

Conversation

LouisTsai-Csie commented Sep 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🗒️ Description

Enhanced Interface

Refactoring target

Implementation Note

🔗 Related Issues or PRs

✅ Checklist

Uh oh!

LouisTsai-Csie Sep 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

LouisTsai-Csie commented Oct 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LouisTsai-Csie commented Oct 10, 2025

Uh oh!

marioevz commented Oct 10, 2025

Uh oh!

marioevz left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

LouisTsai-Csie commented Sep 16, 2025 •

edited

Loading

LouisTsai-Csie Sep 29, 2025 •

edited

Loading

LouisTsai-Csie commented Oct 10, 2025 •

edited

Loading