Skip to content

Tags: ethereum/execution-spec-tests

Tags

v5.4.0

Toggle v5.4.0's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
chore: update steel website blog post links (#2318)

bal@v2.0.0

Toggle bal@v2.0.0's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
chore: update steel website blog post links (#2318)

bal@v1.8.0

Toggle bal@v1.8.0's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
chore: update steel website blog post links (#2318)

bal@v1.7.0

Toggle bal@v1.7.0's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
chore: update steel website blog post links (#2318)

benchmark@v0.0.6

Toggle benchmark@v0.0.6's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
chore(docs,ci): update docs for weld finalization; disable ci workflo…

…ws (#2317)

* chore(ci): disable cron schedule for eip checksums and links

* doc: update pr template and readme for weld finalization

* doc: remove bullet point

bal@v1.6.0

Toggle bal@v1.6.0's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
chore(docs,ci): update docs for weld finalization; disable ci workflo…

…ws (#2317)

* chore(ci): disable cron schedule for eip checksums and links

* doc: update pr template and readme for weld finalization

* doc: remove bullet point

bal@v1.5.0

Toggle bal@v1.5.0's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
chore(docs,ci): update docs for weld finalization; disable ci workflo…

…ws (#2317)

* chore(ci): disable cron schedule for eip checksums and links

* doc: update pr template and readme for weld finalization

* doc: remove bullet point

bal@v1.4.1

Toggle bal@v1.4.1's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
chore(docs,ci): update docs for weld finalization; disable ci workflo…

…ws (#2317)

* chore(ci): disable cron schedule for eip checksums and links

* doc: update pr template and readme for weld finalization

* doc: remove bullet point

benchmark@v0.0.5

Toggle benchmark@v0.0.5's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
feat(benchmark): add SLOAD/SSTORE benchmark test with multi-contract …

…support (#2256)

* feat(benchmark): add SLOAD benchmark test with multi-contract support

Add test_sload_empty_erc20_balanceof to benchmark SLOAD operations on
non-existing storage slots using ERC20 balanceOf() queries.

The idea of this benchmark is to exploit within a single or series of N
contracts calls to non-existing addresses. On this way, we force clients
to resolve as many tree branches as possible.

* feat(benchmark): add SSTORE benchmark test using ERC20 approve

Add test_sstore_erc20_approve that benchmarks SSTORE operations by calling
approve(spender, amount) on pre-deployed ERC20 contracts. Follows the same
pattern as the SLOAD benchmark:
- Auto-discovers ERC20 contracts from stubs
- Splits gas budget evenly across all discovered contracts
- Uses counter as both spender address and amount
- Forces SSTOREs to allowance mapping storage slots

The test measures client performance when writing to many storage slots
across multiple contracts, stressing state-handling write operations.

* fix(benchmark): correct SSTORE benchmark gas calculation

Fixed gas calculation for test_sstore_erc20_approve to ensure accurate
gas usage prediction and prevent transaction reverts:

Key fixes:
- Added memory expansion cost (15 gas per contract)
- Corrected G_LOW gas values in comments (5 gas, not 3)
- Separated per-contract overhead from per-iteration costs
- Improved cost calculation clarity with detailed opcode breakdown

Gas calculation (10M gas, 3 contracts):
- Intrinsic: 21,000
- Overhead per contract: 38
- Cost per iteration: 20,226
- Calls per contract: 164
- Expected gas used: 9,972,306 (99.72% utilization)

* feat(benchmark): add mixed SLOAD/SSTORE benchmark with configurable ratios

Add test_mixed_sload_sstore to test_multi_opcode.py that combines SLOAD
and SSTORE operations with parameterized gas distribution ratios (50-50,
70-30, 90-10).

The test stresses clients with mixed read/write workloads by:
- Dividing gas budget evenly across all discovered ERC20 contract stubs
- Splitting each contract's allocation by the specified percentage ratio
- Executing balanceOf (cold SLOAD on empty slots) for the SLOAD portion
- Executing approve (SSTORE to new allowance slots) for the SSTORE portion

Verified gas calculations for 10M gas budget with 3 contracts (50-50 ratio):
- SLOAD operations: ~2,312 gas/iteration → 719 calls per contract
- SSTORE operations: ~20,226 gas/iteration → 82 calls per contract
- Total operations: 2,403 state operations (2,157 SLOADs + 246 SSTOREs)
- Gas usage: 9.98M / 10M (16K buffer, no out-of-gas errors)

This benchmark enables testing different read/write ratios to identify
client performance characteristics under varying state operation mixes.

* refactor(benchmark): optimize SLOAD/SSTORE benchmarks per review feedback

Address review comments by optimizing loop efficiency:

1. Move function selector MSTORE outside loops (Comment #2)
   - BALANCEOF_SELECTOR and APPROVE_SELECTOR now stored once per contract
   - Saves 3 gas (G_VERY_LOW) per iteration
   - Total savings: ~6,471 gas for 50-50 ratio with 10M budget and 3 contracts

2. Remove unused return data from CALL operations (Comment #1)
   - Changed ret_offset=96/128, ret_size=32 to ret_offset=0, ret_size=0
   - Eliminates unnecessary memory expansion
   - Minor gas savings, cleaner implementation

Skipped Comment #3 (use Op.GAS for addresses):
- Would lose determinism (GAS varies per iteration)
- Adds complexity for minimal benefit
- Counter still needed for loop control

Changes applied to:
- test_sload_empty_erc20_balanceof
- test_sstore_erc20_approve
- test_mixed_sload_sstore (both SLOAD and SSTORE loops)

* refactor(benchmark): simplify SLOAD benchmark memory layout and fix calldata encoding

- Move selector MSTORE outside for-loop (saves gas per contract)
- Use single counter at MEM[32] instead of duplicate at MEM[0] and MEM[64]
- Fix calldata encoding by using args_offset=28 for correct ABI format
- Selector now properly positioned at start of calldata

* refactor(benchmark): simplify SSTORE benchmark memory layout and fix calldata encoding

- Move selector MSTORE outside for-loop (saves gas per contract)
- Use single counter at MEM[32] instead of duplicate at MEM[0]
- Fix calldata encoding by using args_offset=28 for correct ABI format
- Selector now properly positioned at start of calldata

* refactor(benchmark): simplify mixed SLOAD/SSTORE memory layout and fix calldata encoding

- Move selectors MSTORE outside for-loop (saves gas per contract)
- Use separate memory regions for balanceOf and approve to avoid conflicts
- Fix calldata encoding by using correct args_offset for proper ABI format
- Selectors now properly positioned at start of calldata

* refactor(benchmark): simplify mixed test to reuse memory layout consistently

- Reuse MEM[0] for both selectors (sequential operations, no conflict)
- Reuse MEM[32] for both counters (balanceOf then approve)
- Reuse MEM[64] and MEM[96] for parameters
- Consistent args_offset=28 for both operations (was 28 and 128)
- Matches single-opcode test pattern for easier understanding
- Reduces memory footprint from 196 bytes to 96 bytes

* feat(benchmark): add parametrized contract count and stub filtering to single-opcode tests

- Add parametrization for num_contracts [1, 5, 10, 20, 100]
- Implement stub prefix filtering based on test function name
- Add validation to error if insufficient matching stubs
- Add SSTORE benchmark architecture documentation
- Create README with setup instructions and stubs.json format

* fix(benchmark): add type annotations to test functions

* fix(benchmark): add AddressStubs type annotation to address_stubs parameter

* feat(benchmark): add parametrized contract count, stub filtering, and correct gas calculations

- Add num_contracts parametrization [1, 5, 10, 20, 100] to multi-opcode test
- Implement stub prefix filtering for all benchmarks
- Fix gas cost calculations to account for COLD/WARM account access
- CALL operations: first call to each contract is COLD (2600), subsequent are WARM (100)
- SSTORE operations: add cold storage access cost (2100) for zero-to-non-zero writes
- Update gas calculation formulas to solve for calls per contract correctly

* feat(benchmark): add parametrized contract count, stub filtering, and correct gas calculations

- Add num_contracts parametrization [1, 5, 10, 20, 100] to multi-opcode test
- Implement stub prefix filtering for all benchmarks
- Fix gas cost calculations to account for COLD/WARM account access
- CALL operations: first call to each contract is COLD (2600), subsequent are WARM (100)
- SSTORE operations: add cold storage access cost (2100) for zero-to-non-zero writes
- Update gas calculation formulas to solve for calls per contract correctly

bal@v1.4.0

Toggle bal@v1.4.0's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
feat(benchmark): add SLOAD/SSTORE benchmark test with multi-contract …

…support (#2256)

* feat(benchmark): add SLOAD benchmark test with multi-contract support

Add test_sload_empty_erc20_balanceof to benchmark SLOAD operations on
non-existing storage slots using ERC20 balanceOf() queries.

The idea of this benchmark is to exploit within a single or series of N
contracts calls to non-existing addresses. On this way, we force clients
to resolve as many tree branches as possible.

* feat(benchmark): add SSTORE benchmark test using ERC20 approve

Add test_sstore_erc20_approve that benchmarks SSTORE operations by calling
approve(spender, amount) on pre-deployed ERC20 contracts. Follows the same
pattern as the SLOAD benchmark:
- Auto-discovers ERC20 contracts from stubs
- Splits gas budget evenly across all discovered contracts
- Uses counter as both spender address and amount
- Forces SSTOREs to allowance mapping storage slots

The test measures client performance when writing to many storage slots
across multiple contracts, stressing state-handling write operations.

* fix(benchmark): correct SSTORE benchmark gas calculation

Fixed gas calculation for test_sstore_erc20_approve to ensure accurate
gas usage prediction and prevent transaction reverts:

Key fixes:
- Added memory expansion cost (15 gas per contract)
- Corrected G_LOW gas values in comments (5 gas, not 3)
- Separated per-contract overhead from per-iteration costs
- Improved cost calculation clarity with detailed opcode breakdown

Gas calculation (10M gas, 3 contracts):
- Intrinsic: 21,000
- Overhead per contract: 38
- Cost per iteration: 20,226
- Calls per contract: 164
- Expected gas used: 9,972,306 (99.72% utilization)

* feat(benchmark): add mixed SLOAD/SSTORE benchmark with configurable ratios

Add test_mixed_sload_sstore to test_multi_opcode.py that combines SLOAD
and SSTORE operations with parameterized gas distribution ratios (50-50,
70-30, 90-10).

The test stresses clients with mixed read/write workloads by:
- Dividing gas budget evenly across all discovered ERC20 contract stubs
- Splitting each contract's allocation by the specified percentage ratio
- Executing balanceOf (cold SLOAD on empty slots) for the SLOAD portion
- Executing approve (SSTORE to new allowance slots) for the SSTORE portion

Verified gas calculations for 10M gas budget with 3 contracts (50-50 ratio):
- SLOAD operations: ~2,312 gas/iteration → 719 calls per contract
- SSTORE operations: ~20,226 gas/iteration → 82 calls per contract
- Total operations: 2,403 state operations (2,157 SLOADs + 246 SSTOREs)
- Gas usage: 9.98M / 10M (16K buffer, no out-of-gas errors)

This benchmark enables testing different read/write ratios to identify
client performance characteristics under varying state operation mixes.

* refactor(benchmark): optimize SLOAD/SSTORE benchmarks per review feedback

Address review comments by optimizing loop efficiency:

1. Move function selector MSTORE outside loops (Comment #2)
   - BALANCEOF_SELECTOR and APPROVE_SELECTOR now stored once per contract
   - Saves 3 gas (G_VERY_LOW) per iteration
   - Total savings: ~6,471 gas for 50-50 ratio with 10M budget and 3 contracts

2. Remove unused return data from CALL operations (Comment #1)
   - Changed ret_offset=96/128, ret_size=32 to ret_offset=0, ret_size=0
   - Eliminates unnecessary memory expansion
   - Minor gas savings, cleaner implementation

Skipped Comment #3 (use Op.GAS for addresses):
- Would lose determinism (GAS varies per iteration)
- Adds complexity for minimal benefit
- Counter still needed for loop control

Changes applied to:
- test_sload_empty_erc20_balanceof
- test_sstore_erc20_approve
- test_mixed_sload_sstore (both SLOAD and SSTORE loops)

* refactor(benchmark): simplify SLOAD benchmark memory layout and fix calldata encoding

- Move selector MSTORE outside for-loop (saves gas per contract)
- Use single counter at MEM[32] instead of duplicate at MEM[0] and MEM[64]
- Fix calldata encoding by using args_offset=28 for correct ABI format
- Selector now properly positioned at start of calldata

* refactor(benchmark): simplify SSTORE benchmark memory layout and fix calldata encoding

- Move selector MSTORE outside for-loop (saves gas per contract)
- Use single counter at MEM[32] instead of duplicate at MEM[0]
- Fix calldata encoding by using args_offset=28 for correct ABI format
- Selector now properly positioned at start of calldata

* refactor(benchmark): simplify mixed SLOAD/SSTORE memory layout and fix calldata encoding

- Move selectors MSTORE outside for-loop (saves gas per contract)
- Use separate memory regions for balanceOf and approve to avoid conflicts
- Fix calldata encoding by using correct args_offset for proper ABI format
- Selectors now properly positioned at start of calldata

* refactor(benchmark): simplify mixed test to reuse memory layout consistently

- Reuse MEM[0] for both selectors (sequential operations, no conflict)
- Reuse MEM[32] for both counters (balanceOf then approve)
- Reuse MEM[64] and MEM[96] for parameters
- Consistent args_offset=28 for both operations (was 28 and 128)
- Matches single-opcode test pattern for easier understanding
- Reduces memory footprint from 196 bytes to 96 bytes

* feat(benchmark): add parametrized contract count and stub filtering to single-opcode tests

- Add parametrization for num_contracts [1, 5, 10, 20, 100]
- Implement stub prefix filtering based on test function name
- Add validation to error if insufficient matching stubs
- Add SSTORE benchmark architecture documentation
- Create README with setup instructions and stubs.json format

* fix(benchmark): add type annotations to test functions

* fix(benchmark): add AddressStubs type annotation to address_stubs parameter

* feat(benchmark): add parametrized contract count, stub filtering, and correct gas calculations

- Add num_contracts parametrization [1, 5, 10, 20, 100] to multi-opcode test
- Implement stub prefix filtering for all benchmarks
- Fix gas cost calculations to account for COLD/WARM account access
- CALL operations: first call to each contract is COLD (2600), subsequent are WARM (100)
- SSTORE operations: add cold storage access cost (2100) for zero-to-non-zero writes
- Update gas calculation formulas to solve for calls per contract correctly

* feat(benchmark): add parametrized contract count, stub filtering, and correct gas calculations

- Add num_contracts parametrization [1, 5, 10, 20, 100] to multi-opcode test
- Implement stub prefix filtering for all benchmarks
- Fix gas cost calculations to account for COLD/WARM account access
- CALL operations: first call to each contract is COLD (2600), subsequent are WARM (100)
- SSTORE operations: add cold storage access cost (2100) for zero-to-non-zero writes
- Update gas calculation formulas to solve for calls per contract correctly