-
Notifications
You must be signed in to change notification settings - Fork 414
feat(benchmark): add EXTCODESIZE bytecode size benchmark for cold access testing #1961
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(benchmark): add EXTCODESIZE bytecode size benchmark for cold access testing #1961
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## forks/amsterdam #1961 +/- ##
================================================
Coverage 86.33% 86.33%
================================================
Files 538 538
Lines 34557 34557
Branches 3222 3222
================================================
Hits 29835 29835
Misses 4148 4148
Partials 574 574
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
jochem-brouwer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Saw this PR, some small comments/ideas 😄 👍
tests/benchmark/stateful/bloatnet/test_extcodesize_bytecode_sizes.py
Outdated
Show resolved
Hide resolved
tests/benchmark/stateful/bloatnet/test_extcodesize_bytecode_sizes.py
Outdated
Show resolved
Hide resolved
tests/benchmark/stateful/bloatnet/test_extcodesize_bytecode_sizes.py
Outdated
Show resolved
Hide resolved
Support for multiple contract sizes (0.5KB to 24KB) with automatic splitting of gas budget into 16M transactions respecting Fusaka limit. Includes deployment tools for test infrastructure setup.
- Remove unused calculate_init_hashes.py script - Fix nonce handling by using pending tx count to avoid "already known" errors - Use actual network block gas limit instead of hardcoded Fusaka limit - Add fixed gas price for dev mode compatibility - Fix size key naming for integer sizes (e.g., 1.0 -> "1kb" not "1.0kb")
Rewrite the test to maximize cold EXTCODESIZE calls per block: - Add attack contract that loops EXTCODESIZE on CREATE2-derived addresses - Each tx passes a starting salt via calldata to access unique contracts - Fill blocks with 3x 16M gas transactions (~17,637 cold accesses/block) - Add verification tx that stores EXTCODESIZE result in storage - Add 2KB size to test parameters - Calculate gas per iteration dynamically from fork gas costs The test now properly stresses client state loading by ensuring all EXTCODESIZE operations are cold (first access to each contract).
Add comprehensive docstring explaining the benchmark architecture, execution commands, and block structure with cold access guarantees.
These utility scripts are for local deployment only and don't need to be part of the test suite. They contain linting issues that are not worth fixing for non-test utility code.
- Wrap long lines in docstrings to comply with 79 char limit - Use raw docstring for bash command examples with backslashes - Shorten comments to fit line length requirements
Add type annotations for GasCosts and Address parameters to satisfy mypy strict type checking.
EVM automatically truncates to 20-byte address when executing EXTCODESIZE, so the PUSH20(0xFF...FF) + AND masking is unnecessary. Ref: https://github.com/ethereum/go-ethereum/blob/b635e063/core/vm/instructions.go#L337
Replace the PUSH2(0x1000) + JUMPI pattern (which relied on jumping to an invalid offset to trigger failure) with an explicit Conditional + REVERT. This is cleaner, more explicit about error handling intent, and doesn't rely on undefined behavior of invalid jump destinations.
14cccd8 to
628f140
Compare
jochem-brouwer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General suggestion to remove the need for gas calculations. I think this kind of pattern is something we should at some point make into a template (not necessary now, but just noting for future cleanup). I think this kind of attack pattern (execute certain code against a CREATE2 factory) is something which we want to use more often and could reuse.
As mentioned in the comments it is not perfect, but the overhead is small enough compared to what we are measuring here (cold account access).
General Q: where do I find the initcode of the deployed contracts? I understand that we can choose any code here (via the stubs), but would like to see what contracts are initialized.
Have added a suggestion to directly add the other operations EXTCODEx and xCALLxs here 😄 👍
tests/benchmark/stateful/bloatnet/test_extcodesize_bytecode_sizes.py
Outdated
Show resolved
Hide resolved
| + Op.PUSH1(32) | ||
| + Op.MSTORE # Store starting_salt at memory[32] | ||
| # Stack: [num_deployed, starting_salt] | ||
| + Op.POP # Stack: [num_deployed] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the counter here is num_deployed - right?
This means that all txs except the last one will out-of-gas. A tx will revert in case that the conditional fails.
I'd propose a slightly other design. It's not perfect, it will add small overheads, but it saves the calculation of the gas per loop and the starting salt of the tx.
To do so, we read the starting salt from slot 0. We then enter the while loop, and the exit condition is that Op.GASLEFT is higher than some value (which is not too low, and also not too high. It should ensure we can at least afford one more loop, and the SSTORE at the end of the loop). When we exit the loop, we save the current salt in slot 0. Therefore, the next tx will read the init seed and thus start at the account we would read if the tx would not be split up into multiple txs (so essentially resuming the loop).
The test verification should check that these transactions do not OOG.
To verify that the accounts we queried exists, we can use the return value of the factory contract: the num_contracts. Before performing the SSTORE, we should check that the salt we are storing is less (strictly less, so Op.LT, <) than num_deployed as returned from the factory contract.
If a tx OOGs or reverts, it means either:
- There was not enough gas left for the loop + SSTORE (therefore each tx will start at salt 0)
- There were not enough contracts deployed
This setup removes the need for the gas calculations. It adds overhead to the EVM (the gas check and the final SSTORE) which thus "wastes" gas. However since we are querying cold accounts (at least 2600 gas) on Fusaka tx gas limit (16+M) the amount of extra accounts we could have queried without this overhead is much less than the accounts which we would query here, so neglegible. It therefore gives a good indication of the worst case, and it saves the error-prone gas calculations (if we edit the code, we need to edit the gas calculations again, and we might forget to do one of the two 😅 ).
| Op.SHA3(11, 85) # 85 bytes from offset 11 | ||
| # Hash result - EVM auto-truncates to 20-byte address | ||
| # Call EXTCODESIZE and discard result (no storage writes!) | ||
| + Op.EXTCODESIZE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With above suggestion, we can directly edit this test to perform the other operations also!
The EVM gas check loop will take care of the gas calculations and the loop exits.
If we now have Op.POP(Op.EXTCODESIZE(Op.SHA3(11, 85)) we can edit this to:
Op.POP(Op.EXTCODEHASH(Op.SHA3(11, 85))))
Op.EXTCODECOPY(Op.SHA3(11, 85), 0, 1, 1)) -> note: have to copy at least one byte otherwise we do not strictly need to lookup the account. Also need to ensure we are not writing to the window we are using to calculate the CREATE2 address
Op.POP(Op.CALL(address=Op.SHA3(11, 85),gas=1) (CALL can be interchanged with CALLCODE, DELEGATECALL, STATICCALL)
The verification contract should stay the same (extcodesize > 0 verifies contract exists).
Note: for the ops which CALL into the target contract, we just send 1 gas, so it does not matter what code it executes. If it OOGs or not, it will "waste" at most 1 gas.
tests/benchmark/stateful/bloatnet/test_extcodesize_bytecode_sizes.py
Outdated
Show resolved
Hide resolved
|
Let me know if anything is unclear, happy to help 😄 👍 |
|
This PR looks nice, i will just add some format suggestion later, and i wonder if we could remove some of the comments. |
…hardcoded value Replace hardcoded FUSAKA_TX_GAS_LIMIT constant with dynamic fork.transaction_gas_limit_cap() to correctly use 16,777,216 (2^24) per EIP-7825 and adapt to future fork gas limits.
…nceof The gas calculation was missing the per-contract overhead term that accounts for loop setup/teardown costs (counter init, JUMPDEST, condition check, etc). This aligns with the pattern used in test_sstore_erc20_approve.
You can get this from the deployer scenario that you can find in: ethpandaops/spamoor#161 LMK if something isn't clear. |
|
Thanks for the spamoor URL 😄 👍 I have one more argument why I think we should switch to storing the pointer of the next salt into the EVM. If we do this via the calldata approach, then we can parallelize these transactions in an optimistic way (we do not need to execute previous transactions in order to execute the next one. Even though they hit the same attack contract they do not read/write to same storage). So if we store the pointer into the EVM we force all clients to execute these transactions serially, because the second (attack) transaction has to "know" which salt it has to start from, and therefore has to execute the first (attack) transaction first. (Bonus: BALs fix this, making them parallelizable, because we can now read the pointer from the BAL). Let me know if I should assist here, it's a rather big change in the logic, happy to help 😄 👍 |
Implement Jochem's suggested gas-based loop exit strategy: - Attack contract reads/writes salt from storage slot 0 - Loop exits when gas < 50K, saves salt for next TX to resume - Each TX automatically continues from where previous left off - No manual gas calculations needed - contract self-regulates Changes: - Remove calculate_gas_per_iteration() function - Simplify test function - no calldata/iteration calculations - Update build_attack_contract() with SLOAD/SSTORE for state - Store last EXTCODESIZE result in slot 1 for verification This eliminates error-prone gas calculations and makes the benchmark self-correcting regardless of EVM changes.
|
@jochem-brouwer that makes all the sense! Thanks for the suggestion. You are right. I think the latest commit should do what you suggested! Ofc, can't push here anything >24KB. But, will do this bench locally. |
jochem-brouwer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we now verify the execution of this attack by inspecting the storage, the verification tx is not necessary anymore. Also really nice to use the memory instead of stack manipulation. I have two points in which the bytecode seems to be incorrect.
Let me add here how I tested this.
I added some extra code:
factory_stub = get_factory_stub_name(bytecode_size_kb)
# FACTORY CODE DUMMY
INITCODE = Op.STOP * expected_size_bytes
INITCODE_HASH = Bytes(INITCODE).keccak256() # dummy hash
NUM_DEPLOYED = 0xffffff
factory_code = (Op.MSTORE(0, NUM_DEPLOYED) # num deployed
+ Op.MSTORE(32, INITCODE_HASH) # initcode hash
+ Op.RETURN(0, 64))
# Deploy factory stub (address comes from stub file)
factory_address = pre.deploy_contract(
code=factory_code, # Empty bytecode - address from stub
stub=factory_stub,
)
for _ in range(16):
pre.deploy_contract(address=compute_create2_address(initcode=INITCODE,
salt=_,
address=factory_address),
code=Op.STOP * expected_size_bytes)
# Build and deploy the attack contract with storage initialized(import compute_create2_address and Bytes from execution_testing)
This setups a dummy contract for the factory (simply returns the num deployed and the hash) and also setups the target contracts.
Testing with: uv run fill -m stateful ./tests/benchmark/stateful/bloatnet/test_extcodesize_bytecode_sizes.py --clean --evm-dump-dir=./dump --traces --block-gas-limit 20000000
tests/benchmark/stateful/bloatnet/test_extcodesize_bytecode_sizes.py
Outdated
Show resolved
Hide resolved
tests/benchmark/stateful/bloatnet/test_extcodesize_bytecode_sizes.py
Outdated
Show resolved
Hide resolved
|
There is another problem with the post state verification: For |
LouisTsai-Csie
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @CPerezz i add some refactor suggestion, but do not change the logic much as it is quite good. The comment is very clear, but i found some of repeated comments, could you help clean up some of them? Thanks
tests/benchmark/stateful/bloatnet/test_extcodesize_bytecode_sizes.py
Outdated
Show resolved
Hide resolved
tests/benchmark/stateful/bloatnet/test_extcodesize_bytecode_sizes.py
Outdated
Show resolved
Hide resolved
tests/benchmark/stateful/bloatnet/test_extcodesize_bytecode_sizes.py
Outdated
Show resolved
Hide resolved
tests/benchmark/stateful/bloatnet/test_extcodesize_bytecode_sizes.py
Outdated
Show resolved
Hide resolved
tests/benchmark/stateful/bloatnet/test_extcodesize_bytecode_sizes.py
Outdated
Show resolved
Hide resolved
tests/benchmark/stateful/bloatnet/test_extcodesize_bytecode_sizes.py
Outdated
Show resolved
Hide resolved
tests/benchmark/stateful/bloatnet/test_extcodesize_bytecode_sizes.py
Outdated
Show resolved
Hide resolved
|
For the storage to work with attack_storage = Storage({
# Slot 1: last EXTCODESIZE result should match expected size
1: expected_size_bytes,
})
attack_storage.set_expect_any(0)
# Post-state verification:
# 1. Verify that verification contract stored expected size (salt 0)
# 2. Verify attack contract's last EXTCODESIZE returns expected size
# (proves the gas-based loop ran and accessed real contracts)
post = {
verification_address: Account(
storage={
0: expected_size_bytes, # EXTCODESIZE on salt 0
}
),
attack_address: Account(
storage=attack_storage
),
}This skips the storage key 0 😄 (can have any value). We only care about key 1, which should have the EXTCODESIZE of the last salt we ran. |
- Fix GT operand order bugs (loop condition was inverted) - Add Storage.set_expect_any(0) for uv run fill compatibility - Refactor with cleaner code structure per LouisTsai-Csie suggestions - Use tx_gas_limit fixture instead of manual call - Remove redundant defaults and print statements
jochem-brouwer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran the test locally using the dummy factory and dummy contracts as described above (with uv run fill), and it throws:
GasUsedExceedsLimitError('gas used exceeds limit')
This is because we have a verification tx and an attack tx, and in my particular setup the total gas of verification + attack goes over the block gas limit.
@LouisTsai-Csie suggests removing the verification tx and we can indeed do that, as the final verification is the storage check of the attack contract (checks that final salt queries an account with expected code size)
diff --git a/tests/benchmark/stateful/bloatnet/test_extcodesize_bytecode_sizes.py b/tests/benchmark/stateful/bloatnet/test_extcodesize_bytecode_sizes.py
index ba66da5b5..52398ec08 100644
--- a/tests/benchmark/stateful/bloatnet/test_extcodesize_bytecode_sizes.py
+++ b/tests/benchmark/stateful/bloatnet/test_extcodesize_bytecode_sizes.py
@@ -293,12 +293,12 @@ def test_extcodesize_bytecode_sizes(
txs = []
# First transaction: verification (runs first, uses minimal gas)
- verification_tx = Transaction(
- gas_limit=verification_gas,
- to=verification_address,
- sender=sender,
- )
- txs.append(verification_tx)
+ #verification_tx = Transaction(
+ # gas_limit=verification_gas,
+ # to=verification_address,
+ # sender=sender,
+ #)
+ #txs.append(verification_tx)
# Attack transactions: all identical, no calldata needed
for _ in range(num_attack_txs):
@@ -319,7 +319,7 @@ def test_extcodesize_bytecode_sizes(
attack_storage.set_expect_any(0)
post = {
- verification_address: Account(storage={0: expected_size_bytes}),
+ #verification_address: Account(storage={0: expected_size_bytes}),
attack_address: Account(storage=attack_storage),
}Removing the verification tx makes it pass locally (please confirm if verification tx can indeed be removed)
jochem-brouwer identified that verification TX + attack TX exceed block gas limit when running `uv run fill`. The attack contract's slot 1 already provides sufficient verification (stores last EXTCODESIZE result). Changes: - Remove verification TX and related code - Remove unused build_verification_contract() function - Remove unused calculate_verification_gas() function - Revert unintended changes to test_single_opcode.py - Update module docstring to reflect new block structure
jochem-brouwer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LouisTsai-Csie
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! LGTM

🗒️ Description
Test EXTCODESIZE with parametrized bytecode sizes using CREATE2 factory.
This test executes EXTCODESIZE operations against pre-deployed contracts
via factories, measuring the performance impact of different contract
sizes on EXTCODESIZE operations.
Designed for execute mode only - contracts must be pre-deployed.
The test maximizes cold EXTCODESIZE calls to stress client state loading:
This benchmark measures the performance impact of
EXTCODESIZEoperationson contracts of varying sizes (0.5KB to 24KB).
It stresses client state loading by maximizing cold EXTCODESIZE calls.
Overview
The test deploys attack contracts that loop through thousands of unique
contract addresses, calling
EXTCODESIZEon each.By using CREATE2 address derivation, the test accesses pre-deployed
contracts without storing their addresses, maximizing the number of cold
state accesses per block.
┌──────────────────┐
│ Test Block │
├──────────-────────┤
│ TX1: Verification (~30K gas) │
│ └─> Calls EXTCODESIZE on last contract, stores result │
│ │
│ TX2: Attack (~16M gas) │
│ └─> Loops EXTCODESIZE on salts 0..5,878 │
│ │
│ TX3: Attack (~16M gas) │
│ └─> Loops EXTCODESIZE on salts 5,879..11,757 │
│ │
│ TX4: Attack (~16M gas) │
│ └─> Loops EXTCODESIZE on salts 11,758..17,636 │
└───────────────────────────┘
Execute a Single Size
Execute All Sizes
✅ Checklist
toxchecks to avoid unnecessary CI fails, see also Code Standards and Enabling Pre-commit Checks:uvx tox -e statictype(scope):.mkdocs servelocally and verified the auto-generated docs for new tests in the Test Case Reference are correctly formatted.@ported_frommarker.Cute Animal Picture