perf: latency/token experiment — plan + P0 harness by ivanmkc · Pull Request #143 · ivanmkc/termchart

ivanmkc · 2026-06-07T16:16:27Z

Summary

Foundation for the latency + token-reduction experiment. Tracks #142.

Plan: docs/plans/2026-06-07-latency-token-experimentation-plan.md
P0 harness (scripts/experiments/): agent_run.py (Tier B orchestrator, --dry-run safe), metrics.py (RunRecord + median/p95/bootstrap CI + proxy-log parse), config.py (runners/conditions/tasks), aggregate.py, pilot task suite, LiteLLM proxy config + spend logger, GCE vm/startup.sh + vm/provision.sh.
Tier A: corpus_run.py --metrics-out emits per-render JSONL.
Tests: 20 pytest cases (dry-run, no network) green.

Not yet

No code-path changes to the published CLI/viewer/plugin — measurement scaffolding only.
Real runner invocation + rubric judge land in P1; AGY headless is the flagged risk.

Decisions baked in

One shared model across all runners.
Pilot-first ({baseline, c1}) before the full ablation.

Experiment to reduce diagram-generation latency and tokens across three runners (Claude Code, AGY, OpenCode), measured on VMs against baseline vs a combined-fixes build. Decisions baked in: one shared model across runners and a pilot-first first pass.

- corpus_run.py: --metrics-out emits per-render JSONL (Tier A spine) + test - scripts/experiments: agent_run (Tier B orchestrator, --dry-run safe), metrics (RunRecord, median/p95/bootstrap CI, proxy-log parse), config (runners/conditions/tasks), aggregate (pool per-VM streams) - pilot task suite, LiteLLM proxy config + spend logger callback - GCE vm/startup.sh + vm/provision.sh (env-parameterized) - pytest suite (20 tests, dry-run, no network)

…al slice) - podman/: Containerfile (node+python+claude-code+opencode+litellm), run_local.sh (proxy container + per-cell containers + aggregate), entrypoint_cell.sh (build termchart per condition, run runner headless as non-root node, emit RunRecord) - proxy: route shared-model -> Vertex Gemini 2.5 Flash via ADC; clean spend-log usage; Claude is a one-line EXPERIMENT_MODEL flip once Model Garden is enabled - metrics: spend-log slice correlation (parse_proxy_log_slice, count_log_lines) - cell_record.py: per-cell RunRecord from runner output + proxy spend slice - proven: Claude Code in-container draws an ER diagram via termchart end-to-end on Vertex Gemini (57.7k in / 3.0k out tokens captured) Refs #142

When TERMCHART_VIEWER_URL/TOKEN are unset, push and status return EXIT_NO_VIEWER=4 (packages/cli/src/viewer-detect.ts:15), and the message is '…are not set: no termchart viewer configured.' AGENTS.md claimed exit 3 with a non-matching hint, which can mislead an agent into a wrong retry path.

The diagram-recipes examples are loaded verbatim into agent context when an example is adapted. Pretty-printed, they were ~298 KB (the two *-matrix trees alone ~89 KB across ~2,800 lines). Minifying to compact JSON is byte-for-byte the same data but ~45% fewer bytes (305,190 -> 167,886), cutting tokens an agent spends to load an example. Still valid JSON; flow-geometry.test.ts JSON.parses them so it is unaffected. Fix T1 from the latency/token experiment plan. Refs #142.

… gate - entrypoint_cell.sh: OpenCode provider config (openai baseURL->proxy, --model openai/shared-model); capture runner exit code - cell_record.py: success = clean exit + >=1 model call (runner-agnostic; OpenCode emits text not Claude JSON) - README: runner status table (Claude Code + OpenCode working; AGY deferred - no custom base-URL to share the proxy/model) + matrix command Refs #142

ivanmkc-google added 7 commits June 7, 2026 12:13

merge: integrate T5 (AGENTS exit-code) + T1 (minify examples) into c1

58edc17

This was referenced Jun 7, 2026

Reduce termchart diagram-generation latency + tokens (experiment) #142

Open

Experiment: latency/token reduction for diagram generation — pilot findings #151

Open

perf(experiments): bench harness + P1 correctness/fidelity judge + edit flow #168

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: latency/token experiment — plan + P0 harness#143

perf: latency/token experiment — plan + P0 harness#143
ivanmkc wants to merge 7 commits into
masterfrom
perf/reduce-latency

ivanmkc commented Jun 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ivanmkc commented Jun 7, 2026

Summary

Not yet

Decisions baked in

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants