feat(benchmark): add fgprof, block/mutex profiling and improve profile docs#2886
Open
pdrobnjak wants to merge 9 commits intopd/benchmark-comparefrom
Open
feat(benchmark): add fgprof, block/mutex profiling and improve profile docs#2886pdrobnjak wants to merge 9 commits intopd/benchmark-comparefrom
pdrobnjak wants to merge 9 commits intopd/benchmark-comparefrom
Conversation
…e docs Add wall-clock profiling (fgprof) alongside standard CPU profiling to capture off-CPU time (I/O, blocking, GC pauses). Register the fgprof handler behind the benchmark build tag so production binaries are unaffected. Enable block and mutex contention profiling via runtime calls, also gated behind the benchmark build tag. Use conservative sampling rates (1us block threshold, 1/5 mutex fraction) to minimize overhead on TPS. Update benchmark-compare.sh to capture all 6 profile types (CPU, fgprof, heap, goroutine, block, mutex) and report sizes for each. Expand benchmark/CLAUDE.md with: - Profile type reference table with when-to-use guidance - CPU vs fgprof explanation - Heap metric selection guide (inuse_space vs alloc_objects etc) - Interactive flamegraph and drill-down commands - Single-scenario manual capture examples - Source-mapping tip for pprof Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
The latest Buf updates on your PR. Results from workflow Buf / buf (pull_request).
|
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## pd/benchmark-compare #2886 +/- ##
========================================================
- Coverage 57.18% 57.17% -0.01%
========================================================
Files 2091 2091
Lines 171179 171173 -6
========================================================
- Hits 97891 97872 -19
- Misses 64578 64593 +15
+ Partials 8710 8708 -2
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
…extraction benchmark.sh now runs for DURATION seconds (default 120), auto-captures all 6 profile types midway, extracts TPS stats, and exits cleanly. DURATION=0 preserves the original run-forever behavior. Also documents the full optimization loop workflow in CLAUDE.md. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…on loop Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ion loop Adds a Claude Code command that runs a structured optimization workflow: profile -> analyze -> discuss -> implement -> compare -> validate. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… corruption Go's CPU profiler and fgprof conflict when running concurrently on the same process, producing empty or corrupted profiles. Switch from parallel background captures to sequential execution (CPU first, then fgprof), measure actual capture duration for accurate remaining-time calculation, and make BASE_DIR overridable so benchmark-compare.sh can route profiles to per-label directories. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
## Summary
- Namespace `BASE_DIR` per run via `RUN_ID` (defaults to PID):
`/tmp/sei-bench-${RUN_ID}/`
- Auto-claim port offset slots via atomic `mkdir` (supports 30
concurrent runs, zero coordination)
- Replace `git checkout` with git worktrees for isolated builds (no
working tree collisions)
- Replace `~/go/bin/seid` with `GOBIN`-based builds per label (no binary
collisions)
- Replace `~/.sei` staging with `mktemp` + `--home` on all `seid`
commands (no init collisions)
- Pass `SEI_HOME_DIR`/`SEID_BIN` env vars to
`populate_genesis_accounts.py` (backward-compatible defaults)
- Fix pre-existing double lifecycle bug by passing `DURATION=0` to child
start-phase
## Test plan
- [x] Syntax check: `bash -n` on both shell scripts, `py_compile` on
Python
- [x] Two concurrent `benchmark-compare.sh` runs with `DURATION=120` —
both completed, separate `BASE_DIR`s, no port conflicts
- [x] All 6 profile types captured for all 4 nodes (CPU ~145KB, fgprof
~115KB, heap ~248KB, etc.)
- [x] TPS data collected (36-37 readings per node)
- [x] `pprof -diff_base` produces valid analyzable output from both runs
- [x] Port slot locks cleaned up after exit
- [x] Git worktrees cleaned up after exit
- [x] Backward compatible — no env vars needed for single-instance usage
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
## Summary - `benchmark.sh` now auto-claims a port offset slot (same atomic `mkdir` mechanism as `benchmark-compare.sh`) when `PORT_OFFSET` is not explicitly set - Prevents port collisions between concurrent standalone `benchmark.sh` runs and stale `seid` processes from crashed runs - When auto-claiming, `SEI_HOME` is also isolated to `$HOME/.sei-bench-<offset>` to avoid data directory collisions - Port slot is released in all exit paths (staging cleanup, seid cleanup trap, and normal exit) - When `PORT_OFFSET` is explicitly set by the caller (e.g., from `benchmark-compare.sh`), behavior is unchanged ## Test plan - [x] Run two concurrent `benchmark.sh` invocations — both should auto-claim different port slots and run without collisions - [x] Run `benchmark-compare.sh` (which passes explicit `PORT_OFFSET`) — should still work as before - [x] Kill a `benchmark.sh` mid-run — port slot should be released by the trap handler - [x] Syntax check: `bash -n benchmark/benchmark.sh` 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
## Summary - `benchmark-compare.sh` slot 0 previously mapped to `RUN_PORT_OFFSET=0`, which uses the same default ports as standalone `benchmark.sh` (offset 0) - If a stale `seid` from a previous standalone run is holding those ports, the baseline node in a compare run panics with `bind: address already in use` - Fix: change slot-to-offset mapping from `slot * 1000` to `1000 + slot * 1000`, so compare runs start at offset 1000+ and never overlap with standalone benchmark default ports Complements #2900 which added auto-claim port offsets to `benchmark.sh` itself. ## Test plan - [x] Run `benchmark-compare.sh` — first slot should claim offset 1000, not 0 - [x] Run standalone `benchmark.sh` concurrently with `benchmark-compare.sh` — no port collisions - [x] Multiple concurrent `benchmark-compare.sh` invocations still auto-claim non-overlapping slots 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
arajasek
approved these changes
Feb 18, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
benchmarkbuild tag only — no production impact.runtime.SetBlockProfileRateandruntime.SetMutexProfileFraction, also gated behindbenchmarkbuild tag. Uses conservative sampling rates (1us block threshold, 1/5 mutex fraction) to minimize TPS measurement overhead.benchmark.sh(default 120s, 0 = run forever). When set, runs seid in the background, auto-captures all 6 profiles midway, extracts TPS stats (median/avg/min/max), and exits cleanly — enabling fully automated single-scenario profiling.SIGPROF) and fgprof (runtime.GoroutineProfile) conflict when running concurrently on the same process, producing empty or corrupted profiles. Also measure actual capture duration for accurate remaining-time calculation, and makeBASE_DIRoverridable sobenchmark-compare.shroutes profiles to per-label directories correctly.Test plan
go build ./sei-tendermint/node/succeeds (no fgprof in non-benchmark build)go build -tags benchmark ./sei-tendermint/node/succeeds (fgprof registered)go build -tags benchmark ./app/succeeds (block/mutex profiling enabled)DURATION=90 benchmark/benchmark.shand confirm auto-stop, profile capture, and TPS extraction/tmp/sei-bench/pprof/with non-zero sizesgo tool pprof -topworks on each captured profile type🤖 Generated with Claude Code