Skip to content

SP1 Hypercube baseline: prover instrumentation, load test fixes, and tooling#6155

Closed
avilagaston9 wants to merge 10 commits intomainfrom
bench/sp1-hypercube-baseline
Closed

SP1 Hypercube baseline: prover instrumentation, load test fixes, and tooling#6155
avilagaston9 wants to merge 10 commits intomainfrom
bench/sp1-hypercube-baseline

Conversation

@avilagaston9
Copy link
Contributor

Motivation

Preparation for benchmarking SP1 proving performance before upgrading to SP1 Hypercube. This branch includes the tooling changes needed to run reproducible proving benchmarks on the L2.

Description

Prover instrumentation:

  • Switch prover loop to use prove_timed() instead of prove(), logging batch number and elapsed time with structured fields
  • Add scripts/sp1_bench_metrics.sh to parse prover logs and build a CSV + summary table

Deployer fix:

  • Fix stale VK file paths in cmd/ethrex/l2/deployer.rs after the guest program was moved from crates/l2/prover/src/ethrex_guest_program/ to crates/guest-program/

Load test improvements:

  • Use BlockTag::Pending instead of Latest when fetching the starting nonce, so consecutive runs don't restart from nonce 0
  • wait_until_all_included now tracks per-account target nonces instead of a flat tx_amount
  • Add --endless flag for continuous load generation (infinite send+wait loop)

Test plan

  • Start L2 with --mempool.maxsize 200000, run load test, verify txs are included
  • Run load test twice consecutively, verify second run picks up correct nonces
  • Run with --endless flag, verify it loops automatically

Adds a comprehensive roadmap for ethrex L2 development covering:
- zkVM integration (ZisK, SP1, RISC0, OpenVM)
- Native and based rollups
- Security and audit work
- Aligned Layer integration
- Prover infrastructure
- Sequencer improvements
- Testing infrastructure
- Shared bridge (cross-L2 communication)
- Monitor and developer UX
- Documentation
- Contracts and standards compliance
- Tech debt cleanup
- L2Beat stage compliance tracking

Items are organized by category with status tracking (Done, In Progress,
Pending, Blocked, Research) and links to relevant issues/PRs.
… integration guide

Reorganized the L2 roadmap into 8 main sections: zkVM Integration (ZisK, SP1 Hypercube,
OpenVM), Native Rollups, Aligned Layer, Prover Infrastructure, Sequencer, Contracts,
Security & Audit, and Based Rollups. Grouped smaller issues under general items like
"Refactors & Cleanup" to keep the roadmap focused on high-level goals.

Moved the SP1 Hypercube integration guide from docs/l2/ to docs/prover/ where it fits
better with other prover documentation. Added entries to SUMMARY.md for both the
L2 Roadmap and the SP1 Hypercube guide.
Add comprehensive native_rollups.md covering EIP-8079 EXECUTE precompile,
benefits (governance-free upgrades, L1 security, reduced complexity),
implementation phases, technical challenges, and ethrex integration roadmap.

Expand L2 roadmap native rollups section with additional items (gap analysis,
precompile PoC, ultra sound design) and accurate timeline noting EIP-8079
is still in Draft status, not scheduled for Glamsterdam or Hegota forks.
…ty Checks entry

to reduce table verbosity while preserving all issue references
Switch the prover loop from prove() to prove_timed() so each batch logs
a structured line with batch number and elapsed proving time (seconds and
milliseconds). Add scripts/sp1_bench_metrics.sh that tails the prover log,
collects results into a CSV, and prints a summary table on exit.
… of prover

The guest program was moved from crates/l2/prover/src/ethrex_guest_program/
to crates/guest-program/, but the fallback VK paths in the deployer were not
updated. This caused deploy-l1-sp1 to fail with "No such file or directory"
when running outside Docker. CI was unaffected because it passes explicit
VK paths via ETHREX_SP1_VERIFICATION_KEY_PATH in docker-compose.yaml.
…nce conflicts,

and add --endless flag for continuous load generation. The load_test function now
fetches the pending nonce (instead of latest) so re-runs pick up where the previous
round left off. wait_until_all_included now tracks per-account target nonces instead
of a flat tx_amount, which was incorrect for any run where the starting nonce was
not zero.
@github-actions
Copy link

github-actions bot commented Feb 6, 2026

🤖 Kimi Code Review

Security Review

No critical or high-severity security issues identified.

Bug Review

Medium-severity bug found:

File: cmd/ethrex/l2/deployer.rs
Lines: 1018, 1023, 1028

The hard-coded relative paths in get_vk() use ../../crates/guest-program/bin/ but the actual directory structure appears to be ../../crates/l2/prover/src/ethrex_guest_program/src/... based on the original paths. This path change could cause the deployer to fail to find verification keys, leading to deployment failures.

Recommendation: Verify the actual directory structure matches these new paths or revert to the original paths.

Performance Review

No significant performance issues identified. The changes appear to be infrastructure improvements (SP1 Hypercube integration, native rollups documentation) rather than performance regressions.


Automated review by Kimi (Moonshot AI)

@github-actions
Copy link

github-actions bot commented Feb 6, 2026

🤖 Claude Code Review

Now let me verify that percentage calculation in load test is safe from division by zero.

Based on my thorough review of this PR, here's my analysis:


PR #6155 Review: SP1 Hypercube Baseline

Summary

This PR adds prover instrumentation for benchmarking, fixes deployer VK file paths, and improves the load test tool. The changes are mostly low-risk with one bug finding.

Security Vulnerabilities

None found.

Bugs

Medium - Potential Division by Zero in Load Test Percentage Calculation

File: tooling/load_test/src/main.rs:296

(nonce as f64 / target_nonce as f64) * 100.0

Issue: If target_nonce is zero (when tx_amount is 0), this will result in a division by zero, causing a panic or infinity value.

Impact: While unlikely in normal operation (since users typically specify tx_amount > 0), this could crash the load test tool if misconfigured.

Recommendation: Add a check:

let percentage = if target_nonce > 0 {
    (nonce as f64 / target_nonce as f64) * 100.0
} else {
    0.0
};
println!(
    "Waiting for transactions to be included from {encoded_src}. Nonce: {nonce}. Target: {target_nonce}. Percentage: {:.2}%.",
    percentage
);

Low - Bash Script Arithmetic Edge Case

File: scripts/sp1_bench_metrics.sh:60-61

((ms < min)) && min=$ms
((ms > max)) && max=$ms

Issue: If ms is 0, the comparison ((ms < min)) could fail in some bash versions when min=999999999 due to arithmetic evaluation context. However, this is unlikely since proving times should never be 0ms.

Impact: Minimal - edge case that shouldn't occur in practice.

Recommendation: No action required, but could add explicit handling if defensive coding is desired.


Significant Performance Issues

None found.

The changes actually improve performance observability by adding timing instrumentation. The load test improvements (using Pending nonce instead of Latest) should improve consecutive test runs.


Additional Observations (Non-blocking)

Positive Changes

  1. Prover timing instrumentation (prover.rs:102-112) - Clean implementation using the existing prove_timed method
  2. VK path fixes (deployer.rs:10,17,23) - Correctly updates paths after guest program relocation
  3. Load test nonce handling (load_test/src/main.rs:229) - Using BlockTag::Pending instead of Latest fixes the issue where consecutive runs would restart from nonce 0
  4. Target nonce tracking - More accurate than tracking a flat tx_amount

Documentation

The PR adds extensive documentation about SP1 Hypercube and Native Rollups. These are markdown files with no security implications.

Code Quality

  • The bash script uses set -euo pipefail which is good practice
  • Signal handling with trap is properly implemented
  • The Rust changes follow existing patterns and coding style

Conclusion

Overall Assessment: The PR is in good shape with one medium-severity bug that should be fixed before merging.

Required Changes:

  • Fix the potential division by zero in load_test percentage calculation

Recommendation: Approve after fixing the division by zero issue.


Automated review by Claude (Anthropic)

@github-actions
Copy link

github-actions bot commented Feb 6, 2026

🤖 Codex Code Review

No security vulnerabilities or significant performance issues found in the diff.

Bugs

  • Medium: Load test can wait indefinitely if there are pre-existing pending transactions for an account. You now seed target_nonce from the pending nonce, but you wait for the latest/confirmed nonce to reach that target, which includes any older pending txs not sent by this run. If those never confirm, the loop never exits. Consider basing target_nonce on latest (and then sending from latest), or keep pending but track only the nonces you actually submit in this run and wait for those specifically. tooling/load_test/src/main.rs:211, tooling/load_test/src/main.rs:285

Automated review by OpenAI Codex

@github-actions
Copy link

github-actions bot commented Feb 6, 2026

Lines of code report

Total lines added: 10
Total lines removed: 0
Total lines changed: 10

Detailed view
+---------------------------------------+-------+------+
| File                                  | Lines | Diff |
+---------------------------------------+-------+------+
| ethrex/crates/l2/prover/src/prover.rs | 195   | +10  |
+---------------------------------------+-------+------+

…ndpoint.

The script now fetches batch_gas_used, batch_tx_count, and batch_size from
the L2 metrics endpoint (localhost:3702/metrics) for each proved batch, joining
them with proving time into a single CSV and summary table.
…ing.

It now parses all proving_time lines from the file, fetches batch metadata
from Prometheus, prints the table, and exits immediately.
…last]

contains last - first + 1 blocks, not last - first. A single-block batch
was reported as size 0.
@avilagaston9
Copy link
Contributor Author

Superseded by #6157

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant