SP1 Hypercube baseline: prover instrumentation, load test fixes, and tooling by avilagaston9 · Pull Request #6155 · lambdaclass/ethrex

avilagaston9 · 2026-02-06T21:11:33Z

Motivation

Preparation for benchmarking SP1 proving performance before upgrading to SP1 Hypercube. This branch includes the tooling changes needed to run reproducible proving benchmarks on the L2.

Description

Prover instrumentation:

Switch prover loop to use prove_timed() instead of prove(), logging batch number and elapsed time with structured fields
Add scripts/sp1_bench_metrics.sh to parse prover logs and build a CSV + summary table

Deployer fix:

Fix stale VK file paths in cmd/ethrex/l2/deployer.rs after the guest program was moved from crates/l2/prover/src/ethrex_guest_program/ to crates/guest-program/

Load test improvements:

Use BlockTag::Pending instead of Latest when fetching the starting nonce, so consecutive runs don't restart from nonce 0
wait_until_all_included now tracks per-account target nonces instead of a flat tx_amount
Add --endless flag for continuous load generation (infinite send+wait loop)

Test plan

Start L2 with --mempool.maxsize 200000, run load test, verify txs are included
Run load test twice consecutively, verify second run picks up correct nonces
Run with --endless flag, verify it loops automatically

Adds a comprehensive roadmap for ethrex L2 development covering: - zkVM integration (ZisK, SP1, RISC0, OpenVM) - Native and based rollups - Security and audit work - Aligned Layer integration - Prover infrastructure - Sequencer improvements - Testing infrastructure - Shared bridge (cross-L2 communication) - Monitor and developer UX - Documentation - Contracts and standards compliance - Tech debt cleanup - L2Beat stage compliance tracking Items are organized by category with status tracking (Done, In Progress, Pending, Blocked, Research) and links to relevant issues/PRs.

… integration guide Reorganized the L2 roadmap into 8 main sections: zkVM Integration (ZisK, SP1 Hypercube, OpenVM), Native Rollups, Aligned Layer, Prover Infrastructure, Sequencer, Contracts, Security & Audit, and Based Rollups. Grouped smaller issues under general items like "Refactors & Cleanup" to keep the roadmap focused on high-level goals. Moved the SP1 Hypercube integration guide from docs/l2/ to docs/prover/ where it fits better with other prover documentation. Added entries to SUMMARY.md for both the L2 Roadmap and the SP1 Hypercube guide.

Add comprehensive native_rollups.md covering EIP-8079 EXECUTE precompile, benefits (governance-free upgrades, L1 security, reduced complexity), implementation phases, technical challenges, and ethrex integration roadmap. Expand L2 roadmap native rollups section with additional items (gap analysis, precompile PoC, ultra sound design) and accurate timeline noting EIP-8079 is still in Draft status, not scheduled for Glamsterdam or Hegota forks.

…ty Checks entry to reduce table verbosity while preserving all issue references

Switch the prover loop from prove() to prove_timed() so each batch logs a structured line with batch number and elapsed proving time (seconds and milliseconds). Add scripts/sp1_bench_metrics.sh that tails the prover log, collects results into a CSV, and prints a summary table on exit.

… of prover The guest program was moved from crates/l2/prover/src/ethrex_guest_program/ to crates/guest-program/, but the fallback VK paths in the deployer were not updated. This caused deploy-l1-sp1 to fail with "No such file or directory" when running outside Docker. CI was unaffected because it passes explicit VK paths via ETHREX_SP1_VERIFICATION_KEY_PATH in docker-compose.yaml.

…nce conflicts, and add --endless flag for continuous load generation. The load_test function now fetches the pending nonce (instead of latest) so re-runs pick up where the previous round left off. wait_until_all_included now tracks per-account target nonces instead of a flat tx_amount, which was incorrect for any run where the starting nonce was not zero.

github-actions · 2026-02-06T21:11:53Z

🤖 Kimi Code Review

Security Review

No critical or high-severity security issues identified.

Bug Review

Medium-severity bug found:

File: cmd/ethrex/l2/deployer.rs
Lines: 1018, 1023, 1028

The hard-coded relative paths in get_vk() use ../../crates/guest-program/bin/ but the actual directory structure appears to be ../../crates/l2/prover/src/ethrex_guest_program/src/... based on the original paths. This path change could cause the deployer to fail to find verification keys, leading to deployment failures.

Recommendation: Verify the actual directory structure matches these new paths or revert to the original paths.

Performance Review

No significant performance issues identified. The changes appear to be infrastructure improvements (SP1 Hypercube integration, native rollups documentation) rather than performance regressions.

Automated review by Kimi (Moonshot AI)

github-actions · 2026-02-06T21:13:04Z

🤖 Claude Code Review

Now let me verify that percentage calculation in load test is safe from division by zero.

Based on my thorough review of this PR, here's my analysis:

PR #6155 Review: SP1 Hypercube Baseline

Summary

This PR adds prover instrumentation for benchmarking, fixes deployer VK file paths, and improves the load test tool. The changes are mostly low-risk with one bug finding.

Security Vulnerabilities

None found.

Bugs

Medium - Potential Division by Zero in Load Test Percentage Calculation

File: tooling/load_test/src/main.rs:296

(nonce as f64 / target_nonce as f64) * 100.0

Issue: If target_nonce is zero (when tx_amount is 0), this will result in a division by zero, causing a panic or infinity value.

Impact: While unlikely in normal operation (since users typically specify tx_amount > 0), this could crash the load test tool if misconfigured.

Recommendation: Add a check:

let percentage = if target_nonce > 0 {
    (nonce as f64 / target_nonce as f64) * 100.0
} else {
    0.0
};
println!(
    "Waiting for transactions to be included from {encoded_src}. Nonce: {nonce}. Target: {target_nonce}. Percentage: {:.2}%.",
    percentage
);

Low - Bash Script Arithmetic Edge Case

File: scripts/sp1_bench_metrics.sh:60-61

((ms < min)) && min=$ms
((ms > max)) && max=$ms

Issue: If ms is 0, the comparison ((ms < min)) could fail in some bash versions when min=999999999 due to arithmetic evaluation context. However, this is unlikely since proving times should never be 0ms.

Impact: Minimal - edge case that shouldn't occur in practice.

Recommendation: No action required, but could add explicit handling if defensive coding is desired.

Significant Performance Issues

None found.

The changes actually improve performance observability by adding timing instrumentation. The load test improvements (using Pending nonce instead of Latest) should improve consecutive test runs.

Additional Observations (Non-blocking)

Positive Changes

Prover timing instrumentation (prover.rs:102-112) - Clean implementation using the existing prove_timed method
VK path fixes (deployer.rs:10,17,23) - Correctly updates paths after guest program relocation
Load test nonce handling (load_test/src/main.rs:229) - Using BlockTag::Pending instead of Latest fixes the issue where consecutive runs would restart from nonce 0
Target nonce tracking - More accurate than tracking a flat tx_amount

Documentation

The PR adds extensive documentation about SP1 Hypercube and Native Rollups. These are markdown files with no security implications.

Code Quality

The bash script uses set -euo pipefail which is good practice
Signal handling with trap is properly implemented
The Rust changes follow existing patterns and coding style

Conclusion

Overall Assessment: The PR is in good shape with one medium-severity bug that should be fixed before merging.

Required Changes:

Fix the potential division by zero in load_test percentage calculation

Recommendation: Approve after fixing the division by zero issue.

Automated review by Claude (Anthropic)

github-actions · 2026-02-06T21:13:16Z

🤖 Codex Code Review

No security vulnerabilities or significant performance issues found in the diff.

Bugs

Medium: Load test can wait indefinitely if there are pre-existing pending transactions for an account. You now seed target_nonce from the pending nonce, but you wait for the latest/confirmed nonce to reach that target, which includes any older pending txs not sent by this run. If those never confirm, the loop never exits. Consider basing target_nonce on latest (and then sending from latest), or keep pending but track only the nonces you actually submit in this run and wait for those specifically. tooling/load_test/src/main.rs:211, tooling/load_test/src/main.rs:285

Automated review by OpenAI Codex

github-actions · 2026-02-06T21:13:59Z

Lines of code report

Total lines added: 10
Total lines removed: 0
Total lines changed: 10

Detailed view

+---------------------------------------+-------+------+
| File                                  | Lines | Diff |
+---------------------------------------+-------+------+
| ethrex/crates/l2/prover/src/prover.rs | 195   | +10  |
+---------------------------------------+-------+------+

…ndpoint. The script now fetches batch_gas_used, batch_tx_count, and batch_size from the L2 metrics endpoint (localhost:3702/metrics) for each proved batch, joining them with proving time into a single CSV and summary table.

…ing. It now parses all proving_time lines from the file, fetches batch metadata from Prometheus, prints the table, and exits immediately.

…last] contains last - first + 1 blocks, not last - first. A single-block batch was reported as size 0.

avilagaston9 · 2026-02-24T13:33:00Z

Superseded by #6157

avilagaston9 added 7 commits February 3, 2026 16:25

Compact last three Security & Audit items into single Internal Securi…

f653cd8

…ty Checks entry to reduce table verbosity while preserving all issue references

github-actions bot assigned avilagaston9 Feb 6, 2026

avilagaston9 added 3 commits February 6, 2026 18:16

Make benchmark script read the log file once and exit instead of tail…

2b64d8f

…ing. It now parses all proving_time lines from the file, fetches batch metadata from Prometheus, prints the table, and exits immediately.

Fix off-by-one in batch_size metric: a batch spanning blocks [first, …

f9cecbb

…last] contains last - first + 1 blocks, not last - first. A single-block batch was reported as size 0.

avilagaston9 closed this Feb 24, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SP1 Hypercube baseline: prover instrumentation, load test fixes, and tooling#6155

SP1 Hypercube baseline: prover instrumentation, load test fixes, and tooling#6155
avilagaston9 wants to merge 10 commits intomainfrom
bench/sp1-hypercube-baseline

avilagaston9 commented Feb 6, 2026

Uh oh!

github-actions bot commented Feb 6, 2026

Uh oh!

github-actions bot commented Feb 6, 2026

Uh oh!

github-actions bot commented Feb 6, 2026

Uh oh!

github-actions bot commented Feb 6, 2026

Uh oh!

avilagaston9 commented Feb 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

avilagaston9 commented Feb 6, 2026

Motivation

Description

Test plan

Uh oh!

github-actions bot commented Feb 6, 2026

🤖 Kimi Code Review

Security Review

Bug Review

Performance Review

Uh oh!

github-actions bot commented Feb 6, 2026

🤖 Claude Code Review

PR #6155 Review: SP1 Hypercube Baseline

Summary

Security Vulnerabilities

Bugs

Medium - Potential Division by Zero in Load Test Percentage Calculation

Low - Bash Script Arithmetic Edge Case

Significant Performance Issues

Additional Observations (Non-blocking)

Positive Changes

Documentation

Code Quality

Conclusion

Uh oh!

github-actions bot commented Feb 6, 2026

🤖 Codex Code Review

Uh oh!

github-actions bot commented Feb 6, 2026

Lines of code report

Uh oh!

avilagaston9 commented Feb 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant