Skip to content

Conversation

@CPerezz
Copy link
Contributor

@CPerezz CPerezz commented Jan 1, 2026

🗒️ Description

Test EXTCODESIZE with parametrized bytecode sizes using CREATE2 factory.

This test executes EXTCODESIZE operations against pre-deployed contracts
via factories, measuring the performance impact of different contract
sizes on EXTCODESIZE operations.
Designed for execute mode only - contracts must be pre-deployed.

The test maximizes cold EXTCODESIZE calls to stress client state loading:

  1. Using CREATE2 address derivation to access many unique contracts
  2. Filling block gas with transactions close to FUSAKA_TX_GAS_LIMIT (16M)
  3. Verifying contracts exist by checking the last accessed contract's size

This benchmark measures the performance impact of EXTCODESIZE operations
on contracts of varying sizes (0.5KB to 24KB).
It stresses client state loading by maximizing cold EXTCODESIZE calls.

Overview

The test deploys attack contracts that loop through thousands of unique
contract addresses, calling EXTCODESIZE on each.
By using CREATE2 address derivation, the test accesses pre-deployed
contracts without storing their addresses, maximizing the number of cold
state accesses per block.

┌──────────────────┐
│ Test Block │
├──────────-────────┤
│ TX1: Verification (~30K gas) │
│ └─> Calls EXTCODESIZE on last contract, stores result │
│ │
│ TX2: Attack (~16M gas) │
│ └─> Loops EXTCODESIZE on salts 0..5,878 │
│ │
│ TX3: Attack (~16M gas) │
│ └─> Loops EXTCODESIZE on salts 5,879..11,757 │
│ │
│ TX4: Attack (~16M gas) │
│ └─> Loops EXTCODESIZE on salts 11,758..17,636 │
└───────────────────────────┘

Execute a Single Size

uv run execute remote \\
  --fork Prague \\
  --rpc-endpoint http://127.0.0.1:8545 \\
  --rpc-seed-key <SEED_KEY> \\
  --rpc-chain-id 1337 \\
  --address-stubs tests/benchmark/stateful/bloatnet/stubs.json \\
  -- -m stateful --gas-benchmark-values 60 \\
  tests/benchmark/stateful/bloatnet/test_extcodesize_bytecode_sizes.py \\
  -k '24KB' -v

Execute All Sizes

uv run execute remote \\
  --fork Prague \\
  --rpc-endpoint http://127.0.0.1:8545 \\
  --rpc-seed-key <SEED_KEY> \\
  --rpc-chain-id 1337 \\
  --address-stubs tests/benchmark/stateful/bloatnet/stubs.json \\
  -- -m stateful --gas-benchmark-values 60 \\
  tests/benchmark/stateful/bloatnet/test_extcodesize_bytecode_sizes.py -v

✅ Checklist

  • All: Ran fast tox checks to avoid unnecessary CI fails, see also Code Standards and Enabling Pre-commit Checks:
    uvx tox -e static
  • All: PR title adheres to the repo standard - it will be used as the squash commit message and should start type(scope):.
  • All: Considered adding an entry to CHANGELOG.md.
  • All: Considered updating the online docs in the ./docs/ directory.
  • All: Set appropriate labels for the changes (only maintainers can apply labels).
  • Tests: Ran mkdocs serve locally and verified the auto-generated docs for new tests in the Test Case Reference are correctly formatted.
  • Tests: For PRs implementing a missed test case, update the post-mortem document to add an entry the list.
  • Ported Tests: All converted JSON/YML tests from ethereum/tests or tests/static have been assigned @ported_from marker.

Cute Animal Picture

image

@LouisTsai-Csie LouisTsai-Csie self-requested a review January 2, 2026 08:30
@codecov
Copy link

codecov bot commented Jan 2, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 86.33%. Comparing base (d618e52) to head (4790f73).
⚠️ Report is 14 commits behind head on forks/amsterdam.

Additional details and impacted files
@@               Coverage Diff                @@
##           forks/amsterdam    #1961   +/-   ##
================================================
  Coverage            86.33%   86.33%           
================================================
  Files                  538      538           
  Lines                34557    34557           
  Branches              3222     3222           
================================================
  Hits                 29835    29835           
  Misses                4148     4148           
  Partials               574      574           
Flag Coverage Δ
unittests 86.33% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Member

@jochem-brouwer jochem-brouwer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Saw this PR, some small comments/ideas 😄 👍

CPerezz added 10 commits January 6, 2026 01:14
Support for multiple contract sizes (0.5KB to 24KB) with automatic
splitting of gas budget into 16M transactions respecting Fusaka limit.
Includes deployment tools for test infrastructure setup.
- Remove unused calculate_init_hashes.py script
- Fix nonce handling by using pending tx count to avoid "already known" errors
- Use actual network block gas limit instead of hardcoded Fusaka limit
- Add fixed gas price for dev mode compatibility
- Fix size key naming for integer sizes (e.g., 1.0 -> "1kb" not "1.0kb")
Rewrite the test to maximize cold EXTCODESIZE calls per block:
- Add attack contract that loops EXTCODESIZE on CREATE2-derived addresses
- Each tx passes a starting salt via calldata to access unique contracts
- Fill blocks with 3x 16M gas transactions (~17,637 cold accesses/block)
- Add verification tx that stores EXTCODESIZE result in storage
- Add 2KB size to test parameters
- Calculate gas per iteration dynamically from fork gas costs

The test now properly stresses client state loading by ensuring all
EXTCODESIZE operations are cold (first access to each contract).
Add comprehensive docstring explaining the benchmark architecture,
execution commands, and block structure with cold access guarantees.
These utility scripts are for local deployment only and don't need
to be part of the test suite. They contain linting issues that are
not worth fixing for non-test utility code.
- Wrap long lines in docstrings to comply with 79 char limit
- Use raw docstring for bash command examples with backslashes
- Shorten comments to fit line length requirements
Add type annotations for GasCosts and Address parameters to satisfy
mypy strict type checking.
EVM automatically truncates to 20-byte address when executing EXTCODESIZE,
so the PUSH20(0xFF...FF) + AND masking is unnecessary.

Ref: https://github.com/ethereum/go-ethereum/blob/b635e063/core/vm/instructions.go#L337
Replace the PUSH2(0x1000) + JUMPI pattern (which relied on jumping to an
invalid offset to trigger failure) with an explicit Conditional + REVERT.

This is cleaner, more explicit about error handling intent, and doesn't
rely on undefined behavior of invalid jump destinations.
@CPerezz CPerezz force-pushed the feat/bytecode-size-benches branch from 14cccd8 to 628f140 Compare January 6, 2026 00:17
Copy link
Member

@jochem-brouwer jochem-brouwer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General suggestion to remove the need for gas calculations. I think this kind of pattern is something we should at some point make into a template (not necessary now, but just noting for future cleanup). I think this kind of attack pattern (execute certain code against a CREATE2 factory) is something which we want to use more often and could reuse.
As mentioned in the comments it is not perfect, but the overhead is small enough compared to what we are measuring here (cold account access).

General Q: where do I find the initcode of the deployed contracts? I understand that we can choose any code here (via the stubs), but would like to see what contracts are initialized.

Have added a suggestion to directly add the other operations EXTCODEx and xCALLxs here 😄 👍

+ Op.PUSH1(32)
+ Op.MSTORE # Store starting_salt at memory[32]
# Stack: [num_deployed, starting_salt]
+ Op.POP # Stack: [num_deployed]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the counter here is num_deployed - right?
This means that all txs except the last one will out-of-gas. A tx will revert in case that the conditional fails.

I'd propose a slightly other design. It's not perfect, it will add small overheads, but it saves the calculation of the gas per loop and the starting salt of the tx.

To do so, we read the starting salt from slot 0. We then enter the while loop, and the exit condition is that Op.GASLEFT is higher than some value (which is not too low, and also not too high. It should ensure we can at least afford one more loop, and the SSTORE at the end of the loop). When we exit the loop, we save the current salt in slot 0. Therefore, the next tx will read the init seed and thus start at the account we would read if the tx would not be split up into multiple txs (so essentially resuming the loop).

The test verification should check that these transactions do not OOG.

To verify that the accounts we queried exists, we can use the return value of the factory contract: the num_contracts. Before performing the SSTORE, we should check that the salt we are storing is less (strictly less, so Op.LT, <) than num_deployed as returned from the factory contract.

If a tx OOGs or reverts, it means either:

  1. There was not enough gas left for the loop + SSTORE (therefore each tx will start at salt 0)
  2. There were not enough contracts deployed

This setup removes the need for the gas calculations. It adds overhead to the EVM (the gas check and the final SSTORE) which thus "wastes" gas. However since we are querying cold accounts (at least 2600 gas) on Fusaka tx gas limit (16+M) the amount of extra accounts we could have queried without this overhead is much less than the accounts which we would query here, so neglegible. It therefore gives a good indication of the worst case, and it saves the error-prone gas calculations (if we edit the code, we need to edit the gas calculations again, and we might forget to do one of the two 😅 ).

Op.SHA3(11, 85) # 85 bytes from offset 11
# Hash result - EVM auto-truncates to 20-byte address
# Call EXTCODESIZE and discard result (no storage writes!)
+ Op.EXTCODESIZE
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With above suggestion, we can directly edit this test to perform the other operations also!
The EVM gas check loop will take care of the gas calculations and the loop exits.

If we now have Op.POP(Op.EXTCODESIZE(Op.SHA3(11, 85)) we can edit this to:

Op.POP(Op.EXTCODEHASH(Op.SHA3(11, 85))))
Op.EXTCODECOPY(Op.SHA3(11, 85), 0, 1, 1)) -> note: have to copy at least one byte otherwise we do not strictly need to lookup the account. Also need to ensure we are not writing to the window we are using to calculate the CREATE2 address
Op.POP(Op.CALL(address=Op.SHA3(11, 85),gas=1) (CALL can be interchanged with CALLCODE, DELEGATECALL, STATICCALL)

The verification contract should stay the same (extcodesize > 0 verifies contract exists).

Note: for the ops which CALL into the target contract, we just send 1 gas, so it does not matter what code it executes. If it OOGs or not, it will "waste" at most 1 gas.

@jochem-brouwer
Copy link
Member

Let me know if anything is unclear, happy to help 😄 👍

@LouisTsai-Csie
Copy link
Collaborator

This PR looks nice, i will just add some format suggestion later, and i wonder if we could remove some of the comments.

…hardcoded value

Replace hardcoded FUSAKA_TX_GAS_LIMIT constant with dynamic
fork.transaction_gas_limit_cap() to correctly use 16,777,216 (2^24)
per EIP-7825 and adapt to future fork gas limits.
…nceof

The gas calculation was missing the per-contract overhead term that accounts
for loop setup/teardown costs (counter init, JUMPDEST, condition check, etc).

This aligns with the pattern used in test_sstore_erc20_approve.
@CPerezz
Copy link
Contributor Author

CPerezz commented Jan 6, 2026

General Q: where do I find the initcode of the deployed contracts? I understand that we can choose any code here (via the stubs), but would like to see what contracts are initialized.

You can get this from the deployer scenario that you can find in: ethpandaops/spamoor#161

LMK if something isn't clear.

@jochem-brouwer
Copy link
Member

Thanks for the spamoor URL 😄 👍

I have one more argument why I think we should switch to storing the pointer of the next salt into the EVM. If we do this via the calldata approach, then we can parallelize these transactions in an optimistic way (we do not need to execute previous transactions in order to execute the next one. Even though they hit the same attack contract they do not read/write to same storage). So if we store the pointer into the EVM we force all clients to execute these transactions serially, because the second (attack) transaction has to "know" which salt it has to start from, and therefore has to execute the first (attack) transaction first. (Bonus: BALs fix this, making them parallelizable, because we can now read the pointer from the BAL).

Let me know if I should assist here, it's a rather big change in the logic, happy to help 😄 👍

Implement Jochem's suggested gas-based loop exit strategy:
- Attack contract reads/writes salt from storage slot 0
- Loop exits when gas < 50K, saves salt for next TX to resume
- Each TX automatically continues from where previous left off
- No manual gas calculations needed - contract self-regulates

Changes:
- Remove calculate_gas_per_iteration() function
- Simplify test function - no calldata/iteration calculations
- Update build_attack_contract() with SLOAD/SSTORE for state
- Store last EXTCODESIZE result in slot 1 for verification

This eliminates error-prone gas calculations and makes the
benchmark self-correcting regardless of EVM changes.
@CPerezz
Copy link
Contributor Author

CPerezz commented Jan 7, 2026

@jochem-brouwer that makes all the sense! Thanks for the suggestion. You are right.

I think the latest commit should do what you suggested! Ofc, can't push here anything >24KB. But, will do this bench locally.

@CPerezz CPerezz requested a review from jochem-brouwer January 7, 2026 08:52
Copy link
Member

@jochem-brouwer jochem-brouwer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we now verify the execution of this attack by inspecting the storage, the verification tx is not necessary anymore. Also really nice to use the memory instead of stack manipulation. I have two points in which the bytecode seems to be incorrect.

Let me add here how I tested this.

I added some extra code:

    factory_stub = get_factory_stub_name(bytecode_size_kb)

    # FACTORY CODE DUMMY

    INITCODE = Op.STOP * expected_size_bytes
    INITCODE_HASH = Bytes(INITCODE).keccak256() # dummy hash
    NUM_DEPLOYED = 0xffffff

    factory_code = (Op.MSTORE(0, NUM_DEPLOYED) # num deployed
           + Op.MSTORE(32, INITCODE_HASH) # initcode hash
           + Op.RETURN(0, 64))

    # Deploy factory stub (address comes from stub file)
    factory_address = pre.deploy_contract(
        code=factory_code,  # Empty bytecode - address from stub
        stub=factory_stub,
    )

    for _ in range(16):
        pre.deploy_contract(address=compute_create2_address(initcode=INITCODE,
                                                            salt=_,
                                                            address=factory_address),
                            code=Op.STOP * expected_size_bytes)

    # Build and deploy the attack contract with storage initialized

(import compute_create2_address and Bytes from execution_testing)

This setups a dummy contract for the factory (simply returns the num deployed and the hash) and also setups the target contracts.

Testing with: uv run fill -m stateful ./tests/benchmark/stateful/bloatnet/test_extcodesize_bytecode_sizes.py --clean --evm-dump-dir=./dump --traces --block-gas-limit 20000000

@jochem-brouwer
Copy link
Member

There is another problem with the post state verification:

    post = {
        verification_address: Account(
            storage={
                0: expected_size_bytes,  # EXTCODESIZE on salt 0
            }
        ),
        attack_address: Account(
            storage={
                # Slot 1: last EXTCODESIZE result should match expected size
                1: expected_size_bytes,
            }
        ),
    }

For attack_address we only want to verify key 1. In key 0 we store the current salt. EELS rejects because the post-state also contains the 0 key. (This is only for uv run fill, might not pop up in uv run execute)

Copy link
Collaborator

@LouisTsai-Csie LouisTsai-Csie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @CPerezz i add some refactor suggestion, but do not change the logic much as it is quite good. The comment is very clear, but i found some of repeated comments, could you help clean up some of them? Thanks

@jochem-brouwer
Copy link
Member

For the storage to work with uv run fill, we need:

    attack_storage = Storage({
                # Slot 1: last EXTCODESIZE result should match expected size
                1: expected_size_bytes,
            })
    attack_storage.set_expect_any(0)

    # Post-state verification:
    # 1. Verify that verification contract stored expected size (salt 0)
    # 2. Verify attack contract's last EXTCODESIZE returns expected size
    #    (proves the gas-based loop ran and accessed real contracts)
    post = {
        verification_address: Account(
            storage={
                0: expected_size_bytes,  # EXTCODESIZE on salt 0
            }
        ),
        attack_address: Account(
            storage=attack_storage
        ),
    }

This skips the storage key 0 😄 (can have any value). We only care about key 1, which should have the EXTCODESIZE of the last salt we ran.

- Fix GT operand order bugs (loop condition was inverted)
- Add Storage.set_expect_any(0) for uv run fill compatibility
- Refactor with cleaner code structure per LouisTsai-Csie suggestions
- Use tx_gas_limit fixture instead of manual call
- Remove redundant defaults and print statements
Copy link
Member

@jochem-brouwer jochem-brouwer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran the test locally using the dummy factory and dummy contracts as described above (with uv run fill), and it throws:

GasUsedExceedsLimitError('gas used exceeds limit')

This is because we have a verification tx and an attack tx, and in my particular setup the total gas of verification + attack goes over the block gas limit.
@LouisTsai-Csie suggests removing the verification tx and we can indeed do that, as the final verification is the storage check of the attack contract (checks that final salt queries an account with expected code size)

diff --git a/tests/benchmark/stateful/bloatnet/test_extcodesize_bytecode_sizes.py b/tests/benchmark/stateful/bloatnet/test_extcodesize_bytecode_sizes.py
index ba66da5b5..52398ec08 100644
--- a/tests/benchmark/stateful/bloatnet/test_extcodesize_bytecode_sizes.py
+++ b/tests/benchmark/stateful/bloatnet/test_extcodesize_bytecode_sizes.py
@@ -293,12 +293,12 @@ def test_extcodesize_bytecode_sizes(
     txs = []
 
     # First transaction: verification (runs first, uses minimal gas)
-    verification_tx = Transaction(
-        gas_limit=verification_gas,
-        to=verification_address,
-        sender=sender,
-    )
-    txs.append(verification_tx)
+    #verification_tx = Transaction(
+    #    gas_limit=verification_gas,
+    #    to=verification_address,
+    #    sender=sender,
+    #)
+    #txs.append(verification_tx)
 
     # Attack transactions: all identical, no calldata needed
     for _ in range(num_attack_txs):
@@ -319,7 +319,7 @@ def test_extcodesize_bytecode_sizes(
     attack_storage.set_expect_any(0)
 
     post = {
-        verification_address: Account(storage={0: expected_size_bytes}),
+        #verification_address: Account(storage={0: expected_size_bytes}),
         attack_address: Account(storage=attack_storage),
     }

Removing the verification tx makes it pass locally (please confirm if verification tx can indeed be removed)

jochem-brouwer identified that verification TX + attack TX exceed block
gas limit when running `uv run fill`. The attack contract's slot 1
already provides sufficient verification (stores last EXTCODESIZE result).

Changes:
- Remove verification TX and related code
- Remove unused build_verification_contract() function
- Remove unused calculate_verification_gas() function
- Revert unintended changes to test_single_opcode.py
- Update module docstring to reflect new block structure
Copy link
Member

@jochem-brouwer jochem-brouwer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, great work!

Tested locally, it passes 😄 👍

Image

Copy link
Collaborator

@LouisTsai-Csie LouisTsai-Csie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! LGTM

@LouisTsai-Csie LouisTsai-Csie merged commit b3543e9 into ethereum:forks/amsterdam Jan 8, 2026
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants