feat(l2): add prover benchmarking tooling and documentation#6157
feat(l2): add prover benchmarking tooling and documentation#6157avilagaston9 merged 37 commits intomainfrom
Conversation
Switch the prover loop from prove() to prove_timed() so each batch logs a structured line with batch number and elapsed proving time (seconds and milliseconds). Add scripts/sp1_bench_metrics.sh that tails the prover log, collects results into a CSV, and prints a summary table on exit.
… of prover The guest program was moved from crates/l2/prover/src/ethrex_guest_program/ to crates/guest-program/, but the fallback VK paths in the deployer were not updated. This caused deploy-l1-sp1 to fail with "No such file or directory" when running outside Docker. CI was unaffected because it passes explicit VK paths via ETHREX_SP1_VERIFICATION_KEY_PATH in docker-compose.yaml.
…nce conflicts, and add --endless flag for continuous load generation. The load_test function now fetches the pending nonce (instead of latest) so re-runs pick up where the previous round left off. wait_until_all_included now tracks per-account target nonces instead of a flat tx_amount, which was incorrect for any run where the starting nonce was not zero.
…ndpoint. The script now fetches batch_gas_used, batch_tx_count, and batch_size from the L2 metrics endpoint (localhost:3702/metrics) for each proved batch, joining them with proving time into a single CSV and summary table.
…ing. It now parses all proving_time lines from the file, fetches batch metadata from Prometheus, prints the table, and exits immediately.
…last] contains last - first + 1 blocks, not last - first. A single-block batch was reported as size 0.
…ment. When enabled (default), the prover wraps each prove() call with timing and logs structured fields (proving_time_s, proving_time_ms). When disabled with --no-timed, the prover calls prove() directly without timing overhead. Also fixes a clippy as_conversions warning in the elapsed.as_millis() cast.
…ess mode. The clap args now accept LOAD_TEST_RPC_URL, LOAD_TEST_TX_AMOUNT, and LOAD_TEST_ENDLESS environment variables as alternatives to --node, --tx-amount, and --endless flags respectively. CLI flags take precedence over env vars.
The guide (docs/l2/prover-benchmarking.md) explains how to use the localnet, prover timing, load test, and benchmark script together to measure proving performance. The agent workflow (docs/workflows/prover_benchmarking.md) provides step-by-step instructions for running a remote benchmark session, following the same format as the existing execution witness benchmarking workflow.
🤖 Kimi Code ReviewReview SummaryOverall, this PR introduces useful benchmarking infrastructure for the L2 prover with good observability improvements. The changes are mostly well-structured, but there are a few issues to address. Issues Found1. Potential Overflow in
|
🤖 Claude Code ReviewBased on my review of PR #6157, here's my comprehensive code review: Code Review: PR #6157 - Extract Prover Benchmarking ToolingSummaryThis PR adds benchmarking infrastructure for proving performance, including timing instrumentation, load test improvements, and documentation. The changes are mostly low-risk tooling enhancements. Critical Issues1. Bash Script: Potential Integer Overflow (scripts/sp1_bench_metrics.sh:85-93)count=0; total=0; min=999999999; max=0; total_gas=0; total_txs=0Issue: Bash integer arithmetic can overflow for large millisecond values. If proving takes hours, Fix: Add overflow protection or use if [[ $total -gt 9223372036854775807 ]]; then
echo "Warning: overflow detected in timing calculations"
fi2. Bash Script: Division by Zero Risk (scripts/sp1_bench_metrics.sh:97)avg=$((total / count))Issue: While protected by Status: Actually safe, but consider making it more explicit. High Priority Issues3. Load Test: Unwrap Without Error Context (tooling/load_test/src/main.rs:232-234)let nonce = client
.get_nonce(account.address(), BlockIdentifier::Tag(BlockTag::Pending))
.await
.unwrap();Issue: Fix: .await
.context("Failed to fetch pending nonce")?;4. Load Test: Missing Error Propagation Context (tooling/load_test/src/main.rs:285-288)let nonce = client
.get_nonce(src, BlockIdentifier::Tag(BlockTag::Latest))
.await
.unwrap();Issue: Same as above in the wait loop. Medium Priority Issues5. Prover: Potential U64 Overflow in Milliseconds (crates/l2/prover/src/prover.rs:110)proving_time_ms = u64::try_from(elapsed.as_millis()).unwrap_or(u64::MAX),Issue: Using Recommendation: This is practically fine (proving won't take that long), but consider logging a warning on saturation for correctness: proving_time_ms = u64::try_from(elapsed.as_millis())
.inspect_err(|_| warn!("Proving time exceeds u64::MAX ms, clamping"))
.unwrap_or(u64::MAX),6. Bash Script: Inefficient Metric Fetching (scripts/sp1_bench_metrics.sh:30-37)fetch_metric() {
local metric="$1" batch="$2"
curl -s "$METRICS_URL" 2>/dev/null \
| grep "^${metric}{" \
| grep "batch_number=\"${batch}\"" \
| awk '{print $2}' \
| head -1
}Issue: This fetches the entire metrics endpoint for EACH metric for EACH batch (3 metrics per batch). For 100 batches, this makes 300 HTTP requests. Fix: Fetch metrics once and cache: METRICS_CACHE=$(curl -s "$METRICS_URL" 2>/dev/null)
fetch_metric() {
echo "$METRICS_CACHE" | grep "^${1}{" | grep "batch_number=\"${2}\"" | awk '{print $2}' | head -1
}7. Load Test: Nonce Race Condition (tooling/load_test/src/main.rs:232)let nonce = client
.get_nonce(account.address(), BlockIdentifier::Tag(BlockTag::Pending))
.await
.unwrap();Issue: Using Status: Acceptable for single-instance testing. Document this limitation in the guide or add a warning. Low Priority / Style Issues8. Load Test: Clone in Tight Loop (tooling/load_test/src/main.rs:240-260)for i in 0..tx_amount {
let (value, calldata, dst) = tx_builder.build_tx();
// ...
sleep(Duration::from_micros(800)).await;
}Issue: Status: Likely negligible impact. The sleep dominates timing. 9. Prover: Duplicate Log Messages (crates/l2/prover/src/prover.rs:111-124)"Proved batch {} in {:.2?}",
prover_data.batch_number,
elapsed
// vs
"Proved batch {}", prover_data.batch_numberSuggestion: The two branches have similar messages. Consider extracting common logging: let batch_proof = if self.timed {
let (output, elapsed) = self.backend.prove_timed(...)?;
info!(batch = ..., proving_time_s = ..., proving_time_ms = ..., "Proved batch {} in {:.2?}", ...);
output
} else {
let output = self.backend.prove(...)?;
info!(batch = ..., "Proved batch {}", ...);
output
};
self.backend.to_batch_proof(output, prover_data.format)10. Documentation: Missing Batch Size Explanation (docs/l2/prover-benchmarking.md:127)| `batch_size{batch_number="N"}` | Number of blocks in batch N |Clarification: Given the fix in l1_committer.rs:976, confirm this metric now correctly represents block count (inclusive). Positive Observations✅ Off-by-one fix is correct (crates/l2/sequencer/l1_committer.rs:976)
✅ VK path updates are correct (cmd/ethrex/l2/deployer.rs:1018-1028)
✅ Timed flag design is clean
✅ Load test improvements are valuable
✅ Documentation is thorough
✅ ProverBackend trait usage is correct
Security ConsiderationsNo critical security issues found. This is primarily tooling/benchmarking code that:
Performance Considerations
Recommendations
VerdictApproved with minor suggestions. The code is well-structured, the documentation is excellent, and the changes are low-risk. The main concerns are error handling in the load test (easy fix) and bash script efficiency (minor optimization). Automated review by Claude (Anthropic) · custom prompt |
🤖 Codex Code ReviewFindings (ordered by severity)
Other notes
If you want, I can propose concrete patches for the two items above. Automated review by OpenAI Codex · custom prompt |
Lines of code reportTotal lines added: Detailed view |
or in a loop depending on --endless, instead of a loop-with-break pattern.
34dcf6b to
a786cd1
Compare
…lient, tx_builder, and accounts instead of owned values, removing unnecessary clones at the call sites in run_round.
This removes the need for --no-timed; users pass --timed when they want proving time measurement, and omit it otherwise.
both the default and gpu-enabled cargo run commands.
--timed can be passed via make (e.g. make init-prover-sp1 PROVER_ARGS=--timed). Update docs to use the Makefile commands with GPU=true and PROVER_ARGS examples.
PROVER_ARGS, so timing is enabled with TIMED=true (e.g. make init-prover-sp1 GPU=true TIMED=true).
…alias. Clap already reads the env var, so no Makefile plumbing is needed. Reverts the TIMED/PROVER_ARGS additions to the Makefile targets.
…ummary instead of CSV. Add load-test Makefile target with env var configuration. Update docs to use Makefile commands throughout, add mempool limit warning, note that L2 must be running when collecting results, and reference PROVER_CLIENT_TIMED env var instead of --timed flag.
root Makefile target instead. Update docs to run make load-test from the repo root.
script. The report now includes a Server Specs section with hardware info detected from /proc/cpuinfo, /proc/meminfo, sysctl, and nvidia-smi.
user edits to prover benchmarking guide.
by default) in the prover benchmarking workflow.
🤖 Kimi Code ReviewReview SummaryThis PR introduces prover benchmarking capabilities and fixes several issues in the L2 prover system. Here are the key findings: ✅ Good Changes
|
🤖 Codex Code ReviewFindings (ordered by severity)
Notes / minor observations
If you want, I can dig into Automated review by OpenAI Codex · custom prompt |
🤖 Claude Code ReviewNow I have enough context to provide a comprehensive review. Let me compile my findings: PR Review: feat(l2): add prover benchmarking tooling and documentationOverviewThis PR adds comprehensive tooling for reproducible prover benchmarking, including timing instrumentation, metrics collection, load testing improvements, and documentation. The changes are well-structured and the feature additions are valuable for performance measurement. Critical Issues1. Proof Coordinator Logic Change - Potential Race Condition Risk
|
There was a problem hiding this comment.
Pull request overview
This pull request adds comprehensive tooling and documentation for benchmarking prover performance on the ethrex L2. The changes enable reproducible and ergonomic prover benchmarking workflows by adding structured timing logs, a results collection script, load test improvements, and detailed documentation for both manual and agent-assisted workflows.
Changes:
- Adds prover instrumentation with
prove_timed()method and--timedCLI flag for measuring proving time - Enhances proof coordinator to filter provers by type, preventing unnecessary work assignments when specific proof types aren't needed
- Improves load test with pending nonce support, endless mode, and environment variable configuration for easier automation
- Fixes two bugs: VK file paths after guest program relocation and off-by-one error in batch_size metric
- Adds comprehensive documentation for prover benchmarking workflows
Reviewed changes
Copilot reviewed 19 out of 19 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| tooling/load_test/src/main.rs | Adds endless mode, uses pending nonces for consecutive runs, adds env var support, refactors into run_round function |
| scripts/bench_metrics.sh | New benchmark results collection script that parses prover logs and enriches with Prometheus batch metadata |
| docs/workflows/prover_benchmarking.md | New agent workflow documentation for remote server benchmarking |
| docs/l2/prover-benchmarking.md | New user guide for prover benchmarking on local/remote setups |
| crates/l2/prover/src/prover.rs | Adds timed proving support controlled by --timed flag |
| crates/l2/prover/src/config.rs | Adds timed field to ProverConfig |
| crates/l2/prover/src/backend/mod.rs | Adds prover_type() method to ProverBackend trait |
| crates/l2/prover/src/backend/*.rs | Implements prover_type() method for all backend implementations |
| crates/l2/sequencer/proof_coordinator.rs | Adds prover type filtering logic to skip unnecessary work assignments |
| crates/l2/common/src/prover.rs | Adds prover_type field to BatchRequest and new ProverTypeNotNeeded response |
| crates/l2/tee/quote-gen/src/sender.rs | Updates get_batch to include prover_type: ProverType::TDX in BatchRequest |
| crates/l2/sequencer/l1_committer.rs | Fixes off-by-one error in batch_size metric calculation |
| cmd/ethrex/l2/deployer.rs | Fixes VK file paths after guest program relocation to crates/guest-program/bin |
| cmd/ethrex/l2/options.rs | Adds --timed CLI option for prover client |
| crates/l2/Makefile | Simplifies deploy-l1-sp1 target to use --sp1 flag |
Comments suppressed due to low confidence (1)
crates/l2/tee/quote-gen/src/sender.rs:52
- The
get_batchfunction does not handle the newProofData::ProverTypeNotNeededresponse variant that was added to the protocol. When a TDX prover connects but TDX proofs are not needed by the coordinator, it will send backProverTypeNotNeeded, but this function will treat it as an unexpected response and return a generic error "Expecting ProofData::Response". Add a match arm to handleProofData::ProverTypeNotNeededand return a more descriptive error message.
pub async fn get_batch(commit_hash: String) -> Result<(u64, ProgramInput), String> {
let batch = connect_to_prover_server_wr(&ProofData::BatchRequest {
commit_hash: commit_hash.clone(),
prover_type: ProverType::TDX,
})
.await
.map_err(|e| format!("Failed to get Response: {e}"))?;
match batch {
ProofData::BatchResponse {
batch_number,
input,
..
} => match (batch_number, input) {
(Some(batch_number), Some(input)) => {
#[cfg(feature = "l2")]
let input = ProgramInput {
blocks: input.blocks,
execution_witness: input.execution_witness,
elasticity_multiplier: input.elasticity_multiplier,
blob_commitment: input.blob_commitment,
blob_proof: input.blob_proof,
fee_configs: input.fee_configs,
};
#[cfg(not(feature = "l2"))]
let input = ProgramInput {
blocks: input.blocks,
execution_witness: input.execution_witness,
};
Ok((batch_number, input))
}
_ => Err("No blocks to prove.".to_owned()),
},
ProofData::NoBatchForVersion {
commit_hash: server_code_version,
} => Err(format!(
"Next batch does not match with the current version. Server code: {}, Prover code: {}",
server_code_version, commit_hash
)),
_ => Err("Expecting ProofData::Response".to_owned()),
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Greptile OverviewGreptile SummaryThis PR adds reproducible prover benchmarking support across the L2 stack: new optional timed proving instrumentation (controlled by On the coordinator side, Confidence Score: 3/5
|
| Filename | Overview |
|---|---|
| cmd/ethrex/l2/deployer.rs | Updates default VK file paths for RISC0/SP1 to new guest-program location; otherwise unchanged. |
| cmd/ethrex/l2/options.rs | Adds --timed / PROVER_CLIENT_TIMED flag to prover client options and threads it into ProverConfig. |
| crates/l2/Makefile | Adds/adjusts localnet/prover convenience targets and env exports for benchmarking workflow. |
| crates/l2/common/src/prover.rs | Extends BatchRequest with prover_type so coordinator can filter assignments; appears consistent with coordinator usage. |
| crates/l2/prover/src/backend/exec.rs | Implements prover_type() for exec backend and related instrumentation hooks. |
| crates/l2/prover/src/backend/mod.rs | Adds prover_type() to ProverBackend trait and propagates backend parsing; enables coordinator filtering by backend type. |
| crates/l2/prover/src/backend/openvm.rs | Adds ProverBackend::prover_type() but currently unimplemented!() causing runtime panic if openvm backend is selected. |
| crates/l2/prover/src/backend/risc0.rs | Implements prover_type() for Risc0 backend; no issues found. |
| crates/l2/prover/src/backend/sp1.rs | Implements prover_type() for SP1 backend; no issues found. |
| crates/l2/prover/src/backend/zisk.rs | Adds ProverBackend::prover_type() but currently unimplemented!() causing runtime panic if zisk backend is selected. |
| crates/l2/prover/src/config.rs | Adds timed field to ProverConfig for controlling proving-time instrumentation. |
| crates/l2/prover/src/prover.rs | Wires new prover_type into BatchRequest and optional timed proving; backend-specific prover_type() panics remain a risk. |
| crates/l2/sequencer/l1_committer.rs | Fixes batch_size metric off-by-one; no issues found. |
| crates/l2/sequencer/proof_coordinator.rs | Handles prover_type in BatchRequest and skips batches for unneeded prover types; logic appears consistent with new request format. |
| crates/l2/tee/quote-gen/src/sender.rs | Adds prover_type=TDX in batch requests to coordinator; aligns with coordinator filtering behavior. |
| docs/l2/prover-benchmarking.md | Adds benchmarking guide documentation. |
| docs/workflows/prover_benchmarking.md | Adds agent workflow for benchmarking documentation. |
| scripts/bench_metrics.sh | Adds bench log parsing script for prover timing metrics; not executed in build. |
| tooling/load_test/src/main.rs | Adds pending-nonce usage, endless mode, and env var support for load test CLI. |
Sequence Diagram
sequenceDiagram
participant Prover as Prover client
participant Coord as ProofCoordinator
participant Store as Store/DB
Prover->>Coord: BatchRequest{ chain_id, last_verified_batch, prover_type }
Coord->>Store: fetch needed_proof_types + next batch/input
alt prover_type not needed
Coord-->>Prover: ProverTypeNotNeeded
else proof already exists
Coord-->>Prover: ProofAlreadyExists
else no batch/input available
Coord-->>Prover: EmptyBatch / No work
else batch assigned
Coord-->>Prover: BatchResponse{ batch, input, ... }
Prover->>Prover: prove() / prove_timed() (if --timed)
Prover->>Coord: submit proof
end
ArgAction::Set, not SetTrue), shut down the prover process when the coordinator rejects its type instead of retrying forever, and cache the metrics endpoint response in bench_metrics.sh to avoid 3N HTTP requests.
…mplemented!), make send_batches_proof_to_contract private, add natspec to based contract's verifyBatches noting it has no access control, update all L2 docs replacing verifyBatch references with verifyBatches, move distributed_proving.md from docs/prover/ to docs/l2/fundamentals/ since it describes the interaction between proof coordinator, proof sender, and provers rather than prover internals, and restructure the doc to be explanation-first with the testing guide at the end.
crates/l2/prover/src/prover.rs
Outdated
| This prover's backend is not in the required proof types for this deployment. \ | ||
| Shutting down." | ||
| ); | ||
| std::process::exit(1); |
There was a problem hiding this comment.
std::process::exit(1) is abrupt — it skips async runtime shutdown, pending I/O flushes, and all destructors. Since this runs inside a tokio context, consider returning an error from request_new_input (e.g. a dedicated ProverTypeNotNeeded error variant) and letting the caller in start() break out of the loop and shut down cleanly. Or at minimum, use the same pattern as the _ => branch below: return Err("Prover type not needed by coordinator".to_owned()).
There was a problem hiding this comment.
You were right about process::exit(1) being abrupt. I changed the behavior: instead of exiting, the prover now just logs an error and continues with the remaining proof coordinators — since different coordinators may have different configurations. So the shutdown problem is gone.
| .get_prover_input_by_batch_and_version(batch_to_prove, &commit_hash) | ||
| .await? | ||
| else { | ||
| send_response(stream, &ProofData::empty_batch_response()).await?; |
There was a problem hiding this comment.
This changes behavior: previously when get_prover_input_by_batch_and_version returned None, the coordinator sent ProofData::no_batch_for_version(commit_hash), which caused the prover client to log a specific warning about version mismatch. Now it sends an empty batch response, which the prover interprets as "no batches to prove" — hiding the real reason (version mismatch). Was this intentional? If so, worth a comment explaining why the distinction no longer matters.
There was a problem hiding this comment.
You were right that hiding the version mismatch was wrong. I rewrote the coordinator's handle_request to detect it properly. The old next_batch_to_prove_for_version scanned past version-mismatched batches, which meant by the time we reached get_prover_input_by_batch_and_version the batch didn't exist at all — so the mismatch info was lost. Now the flow is sequential early returns without scanning: resolve the next batch (1 + latest_sent), check if the proof exists, check if the batch exists (version mismatch if versions differ), then check if the input matches the version (version mismatch if not). NoBatchForVersion is now sent in both version mismatch cases. Added documentation in docs/l2/architecture/prover.md with the full protocol.
| type SerializedInput = StdIn; | ||
|
|
||
| fn prover_type(&self) -> ProverType { | ||
| unimplemented!("OpenVM is not yet enabled as a backend for the L2") |
There was a problem hiding this comment.
nit: unimplemented!() panics at runtime. Since prover_type() is called on every BatchRequest (via self.backend.prover_type() in request_new_input), anyone who accidentally starts a prover with --backend openvm will get a panic instead of a clean error. This is fine for now since OpenVM isn't wired up, but worth knowing — the trait signature (-> ProverType instead of -> Result<ProverType, _>) makes this the only option short of changing the trait.
There was a problem hiding this comment.
Acknowledged — keeping unimplemented\!() intentionally since changing the trait signature to Result would add complexity to every backend for a case that can't happen in practice (these backends are feature-gated and not wired for L2 yet).
…tect version mismatches instead of hiding them behind empty responses. The old next_batch_to_prove_for_version scanned past version-mismatched batches, so by the time we reached the input lookup the mismatch info was lost. Now handle_request uses sequential early returns without scanning: resolve the next batch (1 + latest_sent), check if the proof exists, check if the batch exists (NoBatchForVersion if versions differ), then check if the input matches the version (NoBatchForVersion if not). Also replace std::process::exit(1) on ProverTypeNotNeeded with a clean log-and-continue: the prover skips that coordinator and moves on to the next endpoint, since different coordinators may have different configs. Introduce an InputRequest enum so request_new_input returns typed results (Batch, RetryLater, ProverTypeNotNeeded) instead of Option/exit. Document the full batch assignment protocol in docs/l2/architecture/prover.md.
…h field from the response. The old name was misleading since the returned hash was the coordinator's, not the version the batch was built with. Sometimes the prover and coordinator versions match but the stored witness version differs, so returning any specific hash is confusing. A plain mismatch signal is clearer.
Re-reviewed: all original comments addressed. process::exit replaced with continue, version mismatch now sends explicit VersionMismatch response.
) ## Motivation The proof coordinator currently assigns the same batch to every prover that requests work, meaning only one prover can be active at a time. This is a bottleneck when multiple provers are available. Additionally, the proof sender verifies one batch per L1 transaction even when multiple proofs are ready, wasting gas on separate transactions. ## Description **Proof Coordinator — distributed batch assignment:** - Track in-flight batch assignments with timestamps using `Arc<std::sync::Mutex<HashMap<(u64, ProverType), Instant>>>` (two-phase lock pattern: brief mutex for scan+assign, storage validation outside lock) - When a prover requests work, assign the first unassigned or timed-out batch — different provers get different batches - Clean up assignments when all proof types arrive for a batch or when batches are verified on-chain - New CLI flag `--proof-coordinator.prover-timeout` (default 600s, env `ETHREX_PROOF_COORDINATOR_PROVER_TIMEOUT`) controls stale assignment timeout **L1 Proof Sender — multi-batch verification:** - Collect all consecutive proven batches from `last_verified_batch + 1` and send them in a single `verifyBatches()` transaction - Always uses `verifyBatches()` (with a single-element array when only one batch is ready) - On any multi-batch error, fall back to per-batch sending — this prevents the sequencer from getting stuck on gas limit or calldata size issues (see #6173 for adding a proper cap) - On invalid proof revert during single-batch fallback, delete the offending proof from the store - Invalid proof detection matches both full error messages (based contract) and error codes (standard contract); see #6098 for normalizing these across contracts **OnChainProposer contracts (standard + based):** - Extract shared verification logic into `_verifyBatchInternal()` to avoid code duplication - Add `verifyBatches(uint256, bytes[], bytes[], bytes[])` that loops over `_verifyBatchInternal()` - Use `calldata` instead of `memory` for proof array parameters in external functions, avoiding unnecessary calldata-to-memory copies (consistent with `verifyBatchesAligned`) - Critical ordering preserved: `_getPublicInputsFromCommitment` called before `lastVerifiedBatch` update - Based contract now enforces sequential verification (`batchNumber == lastVerifiedBatch + 1`), fixing a pre-existing gap - Timelock and interface updates for `verifyBatches` **Metrics & Grafana:** - Add `tx_hash` label to `batch_verification_gas` metric so batches verified in the same multi-batch tx share the same gas value and tx hash - New "Verification Gas by Batch" xychart panel (batch_number on X, gas on Y, tx_hash in tooltip) <img width="1363" height="671" alt="image" src="https://github.com/user-attachments/assets/59c9175e-cf3d-41a8-8ecb-881965e9030e" /> **Aligned mode is unchanged** — it already supports multi-batch via `L1ProofVerifier`. **Note:** The `prover_type()` method on `ProverBackend` and the `prover_type` field in `BatchRequest` overlap with #6157. OpenVM and ZisK use `unimplemented!()` since they are not yet enabled as L2 backends. Whichever PR lands first, the other will resolve on rebase. ## Checklist - [ ] Updated `STORE_SCHEMA_VERSION` (crates/storage/lib.rs) if the PR includes breaking changes to the `Store` requiring a re-sync. --------- Co-authored-by: Ivan Litteri <67517699+ilitteri@users.noreply.github.com>
Motivation
Add tooling to make prover benchmarking workflows reproducible and ergonomic: structured timing logs, a results collection script, load test improvements, and documentation. Also rewrite the proof coordinator's batch assignment logic to properly detect and surface version mismatches and prover type rejections.
Description
Proof coordinator rewrite:
handle_requestwith a clear 5-step batch assignment protocol: (1) prover type check, (2) resolve next batch, (3) proof already exists, (4) batch existence + version check, (5) input version match. Each step is a sequential early return with comments explaining the logic.NoBatchForVersiontoVersionMismatchand remove thecommit_hashfield. The old name was misleading and the returned hash was confusing since sometimes the prover and coordinator versions match but the stored witness version differs.ProverTypeNotNeededresponse so the coordinator can permanently reject provers whose backend isn't needed for the deployment. The prover logs an error and skips that coordinator, continuing with others.InputRequestenum (Batch,RetryLater,ProverTypeNotNeeded) sorequest_new_inputreturns typed outcomes instead of overloadingOption/errors. This replaces theprocess::exit(1)that fired when the coordinator rejected the prover's type.all_proofs_existloop andcontains_batchcheck fromhandle_request.docs/l2/architecture/prover.mdwith step-by-step flow and prover-side handling table.Bug fixes:
batch_sizemetric (single-block batch reported as 0)--sp1flag in Makefile to pass explicittruevalue (the arg usesArgAction::Set, notSetTrue)Load test improvements:
--endlessflag for continuous load generationLOAD_TEST_RPC_URL,LOAD_TEST_TX_AMOUNT,LOAD_TEST_ENDLESSProver instrumentation:
prove_timed) and log-parsing benchmark script (scripts/bench_metrics.sh)--timedflag to prover CLI to control whether proving time is measured (default: disabled). Set viaPROVER_CLIENT_TIMEDenv var.Proof coordinator (other):
prover_typefield toBatchRequestso the coordinator knows which backend the client runsprover_type()method to theProverBackendtraitDocumentation:
docs/l2/prover-benchmarking.md)docs/workflows/prover_benchmarking.md)Checklist
STORE_SCHEMA_VERSION(crates/storage/lib.rs) if the PR includes breaking changes to theStorerequiring a re-sync.